forked from alumae/kaldi-nnet-dur-model
-
Notifications
You must be signed in to change notification settings - Fork 0
Neural network phone duration model on top of the Kaldi speech recognition framework
License
chagge/kaldi-nnet-dur-model
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
INTRODUCTION ============ Implementation of a neural network phone duration model, as described in the paper: Tanel Alumäe. Neural network phone duration model for speech recognition. Interspeech 2014, Singapore. https://phon.ioc.ee/dokuwiki/lib/exe/fetch.php?media=people:tanel:icassp2014-durmodel.pdf DEPENDENCIES ============ * Python 2.6 (with argparse) or 2.7 * Theano * Pylearn2 Theano can be installed using python's `pip` utility: pip install Theano --user This installs Theano locally (not systemwide). More instructions: http://deeplearning.net/software/theano/install.html Pylearn2 should be cloned from Github, see http://deeplearning.net/software/pylearn2/#download-and-installation USAGE ===== See `run_tedlium.sh` for a sample script that trains a duration model on TEDLIUM data, on top of already trained MMI triphone models. The improvement on TEDLIUM data is very small, however. Larger improvements can be expected for languages that have phonetic duration opposition, i.e., a phoneme can have either short or long duration, and the duration changes the meaning of a word (see http://en.wikipedia.org/wiki/Length_(phonetics)) Duration model is trained using Pylearn2 that itself uses Theano. Theano can use GPU which makes the training much faster (takes about 1 hour on TEDLIUM data, when using a Tesla K20). You should use the ~/.theanorc file to instruct Theano to use the GPU: [global] device = gpu floatX = float32 If you use a cluster, you should also instruct Theano to use a machine-local temporary directory for its compilation directory. Set the following line in the [global] section of the .theanorc file: base_compiledir=/tmp/%(user)s/theano.NOBACKUP ADDING A NEW LANGUAGE ===================== All language specific details are defined in `dur-model/python/lat-model/data/languages.yaml`. CITING ====== You can cite the following paper if you use this software: @InProceedings{alumae2014, author={Alum\"{a}e, Tanel}, title={Neural network phone duration model for speech recognition}, booktitle={Interspeech 2014}, address={Singapore}, year=2014 }
About
Neural network phone duration model on top of the Kaldi speech recognition framework
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published