Skip to content

Richi91/SpeechRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 

Repository files navigation

BiRNN in Blocks trained with CTC on TIMIT

Implementation in Blocks (Theano). Trainable with CTC or framewise.

requirements: - Theano: http://deeplearning.net/software/theano/install.html - Blocks: http://blocks.readthedocs.org/en/latest/setup.html - Blocks extras: https://github.com/mila-udem/blocks-extras - Fuel: http://fuel.readthedocs.org/en/latest/setup.html - PySoundFile to read timit's depcrecated .wav-like format, See: http://pysoundfile.readthedocs.org/en/0.7.0/ and https://github.com/bastibe/PySoundFile - python_speech_features for preprocessing (FFT-based filterbank), see http://python-speech-features.readthedocs.org/en/latest/ + https://github.com/jameslyons/python_speech_features

#Notes:

  • Decoding: simple argmax, no expensive beamsearch
  • Mapping from original 61 to reduced 39 Phonemes can be done before training or during decoding.

3 layer BiRNN with [300,250,200] hidden units, batch size 40, AdaDelta, mapping to 39 classes before training:

  • GRU on MFCC features: 19.5% PER
  • GRU on Log-FB features: 20.5% PER
  • LSTM on MFCC features: 19.5% PER
  • LSTM on Log-FB features: ?

#Credits CTC Implementation: ctc_cost.py is copied from Philemon Brakel's repository: https://github.com/pbrakel/CTC-Connectionist-Temporal-Classification

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages