Introduction

This repository contains scripts for training and evaluation of neural network based acoustic models. The scripts use Python framework Chainer which supports GPU. These scripts were used in the following publications:

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task, TSD 2018
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task, SPECOM 2018
A Comparison of Adaptation Techniques and Recurrent Neural Network Architectures, SLSP 2018

The acoustic models are evaluated on the TIMIT phone recognition task.

Repository Description

Folder Structure

data - Features, utterance offsets, targets. Data in this directory are not part of the repository and must be downloaded separately from this link
kaldi - Baseline HMM-DNN model trained in Kaldi
recog - Recognizer executable and files used only in recognizer
recog_src - Recognizer source files, needed only when building the recongizer and not using precompiled executable in recog folder
scripts - All scripts used for training / evaluation

Input Data

All input files specified in this README can contain the characters {}, which can be substituted with substring train, dev or test for training, development or test data. The training / evaluation scripts used in this repository also replace the characters {} by the correct substring, depending on where the data is used. This allows for simpler input data specification using fewer arguments.

The folder data contains the TIMIT corpus processed and saved using several techniques. Each feature type is saved in a separate subfolder. All feature files have the name data_{}.npy and each subfolder also contains a feature transform in a file final.feature_transform.

All data files are in numpy .npy format and contain concatenated training, development and test utterances. The data directory contains lists of the original utterance names. The data files were generated in the same order. The offsets of each utterance or speaker are provided in files offsets_{}.npy and offsets_spk_{}.npy.

Note: Data in the data directory are not part of this repository and must be downloaded separately from this link.

Used features are:

fbank40 - Filter bank values
fbank40norm - Filter bank values globally normalized to zero mean and unit variance
mfcc - Mel-frequency cepstral coefficients
mfcc_cmn_perspk - Mel-frequency cepstral coefficients (MFCC) with cepstral mean normalization (CMN) computed per speaker
mfcc_cmn_perutt - MFCC with CMN computed per utterance
fmllr - feature space maximum likelihood linear regression (fMLLR) data generated using Kaldi

i-vectors are saved in these files:

ivectors/online/ivectors_{}.npy - Online i-vectors
ivectors/offline_perspk/ivectors_{}.npy - Offline i-vectors computed per speaker
ivectors/offline_perutt/ivectors_{}.npy - Offline i-vectors computed per utterance

The targets for training and development data are saved in targets/targets_{}.npy. There are no targets for testing data.

Scripts

The scripts are separated into several directories:

common - Scripts used for training, evaluation, model specifications and master script.
example - Example scripts
papers - Scripts which perform the data preparation, training and evaluation with the same configuration as in our papers.
util - Utility and helper scripts. They are imported from other scripts and this folder should therefore be in user's $PYTHONPATH.

The folder common contains several scripts used in different phases of acoustic model training / evaluation:

generate_folds.py - Separation of the input data into folds
train.py - Training
predict_folds.py - Evaluation of the fold network outputs
evaluate.py - Evaluation of the phone error rate

These scripts can be executed separatedly (which is used mostly for debugging) or the master script called master_script.py can be used to execute them all in the correct order and with correct arguments.

All the scripts assume they are executed with the current working directory being the repository root directory. For example, this command can be used to execute the master script in the correct directory:

$ python ./scripts/common/master_script.py [options]

Output data

All output data are saved to folder results. The recognizer also creates an auxiliary folder lab, which can be deleted after the evaluation script finishes. The output data is saved to subfolders according to output data type and also some script arguments. The resulting phone error rates (PER) are written to the stdout.

The following description uses these values, which are substituted according to the script arguments:

[num_folds] - Numer of folds used. If folds are not used, this value is equal to 0.
[data] - Name of the subdirectory containing the features, for ex. fmllr
[ivectors] - Name of the subdirectory containing the ivectors, for ex. offline_perspk
[output_dir] - Name of the output directory
[output_id] - Arbitrary string describing the experiment being run. This string will be used as the name of the subdirectory containing intermediate and final results.

All output directories are:

Folds - [output_dir]/fold_data/[num_folds]/[data]+[ivectors], if i-vectors are not used, the part +[ivectors] is missing
Fold network outputs - [output_dir]/fold_data_out/[num_folds]/[output_id]
Fold network models - [output_dir]/models/folds/[num_folds]/[output_id]
Master network models - [output_dir]/models/master/[num_folds]/[output_id]
RPL network models - [output_dir]/models/rpl/[num_folds]/[output_id]

Recognizer

The part of this repository is a triphone-based HMM-DNN phone recognizer. The recognizer is written in C++. Its source files are in the directory recog_src and CMake is used to generate project files. Compiled recognizer executables for both Windows and GNU Linux are in the directory recog together with necessary files for recognition of baseline Kaldi HMM.

Running

Prerequisities

Python 3.6+
Chainer 3.5 with CuPy
Numpy

The scripts may work in other versions, but only these versions were tested.

Important Notes

The directory scripts/util must be added to $PYTHONPATH if executing any script in scripts/common directly. Scripts in scripts/papers modify the $PYTHONPATH and user can omit this change.
All scripts should be executed with current directory being the repository root, otherwise the default paths won't be correctly found and must be set via arguments. For ex.

$ python ./scripts/common/master_script.py [options]

Scripts in scripts/papers must also be executed from the repository root, for ex.

$ ./scripts/papers/tsd2018/run.sh

Use different --output-id for different experiments. Whenever other arguments change (mainly network specification), different output ID should be used to avoid using intermediate files meant for another network.

Examples

Simple Feed-Forward Example

$ python ./scripts/common/master_script.py --output-dir example_out
    --output-id example_ff
    --network-spec "-n ff -l 8 -u 2048 -a relu --splice 5 -d 0.2"

This command trains and evaluates a feed-forward network with 8 layers and 2048 units, ReLU activation functions and dropout ratio = 0.2. The splicing size is 5 (in each direction, so in total there are 11 stacked frames in the network input). This example trains one network which is saved in the directory example_out/models/master/0/example_ff.

Note: In feed-forward case, only splicing size 5 should be used, otherwise new feature transform files should be generated.

Simple LSTM Example

$ python ./scripts/common/master_script.py --output-id example_lstm
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"

In this example, only network architecture is different. LSTM network with 4 layers and 1024 units is used. Output time delay is 5 time steps and used dropout ratio is again 0.2.

LSTM with Specified Optimizer

$ python ./scripts/common/master_script.py --output-id example_lstm_optimizer
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"
    -o adam momentumsgd
    -b 512 128
    --lr 0 1e-3 1e-4 1e-5

This example is the same as the last one, except it now specifies the optimizer settings. -o specifies the optimizer, -b batch size and --lr the learning rate. All of these parameters accept several arguments and training is performed in several stages. Each argument is used for one stage. If any parameter has lower number of arguments than the longest one, the last argument is automatically duplicated.

Therefore, the last command is identical with:

$ python ./scripts/common/master_script.py --output-id example_lstm
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"
    -o adam momentumsgd momentumsgd momentumsgd
    -b 512 128 128 128
    --lr 0 1e-3 1e-4 1e-5

and the training stages in this example use the following settings:

Adam, batch size = 512
Momentum SGD, batch size = 128, learning rate = 1e-3
Momentum SGD, batch size = 128, learning rate = 1e-4
Momentum SGD, batch size = 128, learning rate = 1e-5

Note: Adam has no learning rate, therefore the corresponding learning rate argument (0 in this case) is not used, but must be specified anyway so the training stages are corretly set up. N-th argument of -o, -b and --lr belongs to N-th training stage.

Using Fold Networks with Regularization Post Layer (RPL)

$ python ./scripts/common/master_script.py --output-id example_lstm_folds
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"
    -o adam momentumsgd
    -b 512 128
    --lr 0 1e-3 1e-4 1e-5
    --num-folds 5

This command performs several steps:

Separates training data into 5 folds
Trains master network using the full training set
Trains fold networks, N-th network is trained using concatenated folds except the N-th fold
Retrieves fold network outputs, N-th network uses N-th fold as input
Trains a simple network consisting only of RPL, network inputs are fold network outputs from the previous step and the targets are original training set targets
Creates models from all combinations of master network, fold network and RPL and evaluates them all on the test data

Using i-vectors

$ python ./scripts/common/master_script.py --output-id example_lstm_ivectors
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"
    -o adam momentumsgd
    -b 512 128
    --lr 0 1e-3 1e-4 1e-5
    --ivector-dir data/ivectors/online data/ivectors/offline_perspk

This example is the same as LSTM with Specified Optimizer, except i-vectors are used now. --ivector-dir parameter has 2 arguments: training and test i-vectors directories. This parameter has no default value and if not specified, i-vectors are simply not used.

In this example, online i-vectors are used for training and offline per speaker for testing.

Evaluating on Both Development and Test Data

$ python ./scripts/common/master_script.py --output-id example_lstm_optimizer
    --network-spec "-n lstm -l 4 -u 1024 --timedelay 5 -d 0.2"
    -o adam momentumsgd
    -b 512 128
    --lr 0 1e-3 1e-4 1e-5
    --eval-data dev test

The parameter --eval-data specifies, which dataset should be used for evaluation. The accepted values are dev for development data and test for test data. Both values can be specified at once and both will be evaluated in order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

kaldi

kaldi

recog

recog

recog_src

recog_src

scripts

scripts

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Introduction

Repository Description

Folder Structure

Input Data

Scripts

Output data

Recognizer

Running

Prerequisities

Important Notes

Examples

Simple Feed-Forward Example

Simple LSTM Example

LSTM with Specified Optimizer

Using Fold Networks with Regularization Post Layer (RPL)

Using i-vectors

Evaluating on Both Development and Test Data

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
kaldi		kaldi
recog		recog
recog_src		recog_src
scripts		scripts
.gitignore		.gitignore
README.md		README.md

mbencherif/NNAcousticModeling

Folders and files

Latest commit

History

Repository files navigation

Introduction

Repository Description

Folder Structure

Input Data

Scripts

Output data

Recognizer

Running

Prerequisities

Important Notes

Examples

Simple Feed-Forward Example

Simple LSTM Example

LSTM with Specified Optimizer

Using Fold Networks with Regularization Post Layer (RPL)

Using i-vectors

Evaluating on Both Development and Test Data

About

Resources

Stars

Watchers

Forks

Languages