GitHub - chagge/kaldi-nnet-dur-model: Neural network phone duration model on top of the Kaldi speech recognition framework

chagge / kaldi-nnet-dur-model Public

forked from alumae/kaldi-nnet-dur-model

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Neural network phone duration model on top of the Kaldi speech recognition framework

BSD-3-Clause license

0 stars 9 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dur-model		dur-model
.gitignore		.gitignore
LICENSE		LICENSE
README		README
run_tedlium.sh		run_tedlium.sh

Repository files navigation

INTRODUCTION
============

Implementation of a neural network phone duration model, as described in
the paper:

Tanel Alumäe. Neural network phone duration model for speech recognition. 
Interspeech 2014, Singapore.
https://phon.ioc.ee/dokuwiki/lib/exe/fetch.php?media=people:tanel:icassp2014-durmodel.pdf

DEPENDENCIES
============

  * Python 2.6 (with argparse) or 2.7
  * Theano
  * Pylearn2
  
Theano can be installed using python's `pip` utility:

pip install Theano --user

This installs Theano locally (not systemwide). More instructions: 
http://deeplearning.net/software/theano/install.html

Pylearn2 should be cloned from Github, see 
http://deeplearning.net/software/pylearn2/#download-and-installation
  
  
  
USAGE
=====

See `run_tedlium.sh` for a sample script that trains a duration model
on TEDLIUM data, on top of already trained MMI triphone models. The 
improvement on TEDLIUM data is very small, however. Larger improvements
can be expected for languages that have phonetic duration opposition,
i.e., a phoneme can have either short or long duration, and the duration
changes the meaning of a word (see http://en.wikipedia.org/wiki/Length_(phonetics))

Duration model is trained using Pylearn2 that itself uses Theano. Theano
can use GPU which makes the training much faster (takes about 1 hour on
TEDLIUM data, when using a Tesla K20). You should use the ~/.theanorc 
file to instruct Theano to use the GPU:

[global]
device = gpu 
floatX = float32

If you use a cluster, you should also instruct Theano to use a 
machine-local temporary directory for its compilation directory. Set the
following line in the [global] section of the .theanorc file:

base_compiledir=/tmp/%(user)s/theano.NOBACKUP


ADDING A NEW LANGUAGE
=====================

All language specific details are defined in `dur-model/python/lat-model/data/languages.yaml`.

CITING
======

You can cite the following paper if you use this software:

@InProceedings{alumae2014,
  author={Alum\"{a}e, Tanel},
  title={Neural network phone duration model for speech recognition},
  booktitle={Interspeech 2014},
  address={Singapore},
  year=2014
}