Skip to content

Recurrent Neural Networks for Word Duration Measurement

License

Notifications You must be signed in to change notification settings

adiyoss/DeepWDM

Repository files navigation

DeepWDM - Recurrent Neural Networks for Word Duration Measurement written in Torch7

Content

The repository contains code for word duration measurement.

  • back_end folder: contains the training algorithms, it can be used for training the model on new datasets or using different features.
  • front_end folder: contains the features extraction algorithm.
  • lib folder: contains some useful python scripts.
  • data folder: contains the example file to test the repository.

Installation

The code is compatible with OSX and Linux and was tested on OSX El-Capitan and Ubuntu 14.04.

Dependencies

The code uses the following dependencies:

  • Torch7 with RNN package
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh 

# On Linux with bash
source ~/.bashrc
# On Linux with zsh
source ~/.zshrc
# On OSX or in Linux with none of the above.
source ~/.profile

# For rnn package installation
luarocks install rnn

Ubuntu

Ubuntu users should also install SoX:

apt-get install sox

Model Installation

First, download the desired model: RNN, 2 RNN Layers, Bi-Directional RNN, 2 Layers of Bi-Directionl RNNs. Than, move the model file to: back_end/results/ inside the project directory.

Usage

For measurement just type:

python predict.py "input wav file" "output text grid file" "model type"

Example

You can try our tool using the example file in the data folder. Type:

python predict.py data/test.wav data/test.TextGrid rnn

Training Your Own Model

In order to train DeepWDM model using your own data you need to preform two steps:

  • A. Extract features
  • B. Train the model

Extract features

Extracting features for training new model can be done by using the run_front_end.py script from the front/_end folder. This script get as input three parameter:

  • A. The path where to the folder which contains the .wav files.
  • B. the path to the manual annotation files. Those files should be in a TextGrid format, the same as in the example folder.
  • C. The path where to save the features and labels.

To test the feature extraction procedure, type the following command from the front_end folder:

python run_front_end.py data/test_file/ --in_path_y data/test_file/ data/test_features/

This script will generate two files(tmp.features and tmp.label), one for the features and one for the labels. These files will be used to train the model.

Train the model

In order to train the model you should run the run.lua script from the back/_end folder with the right path to the labels and features from the previous step. The parameter for the new files are: -folder_path, -x_filename and -y_filename.

In order to use the new trained model with the predict.py script, you should rename it as 1_layer_model.net, place it under the results folder and choose the rnn mode.

Useful Tricks

  • In order to load the data faster, it is recommended to convert the features and labels files to .t7 format. You can do it by simply using the convert2t7.lua script, it gets as input the path to the features and label files along with the desired output paths, and saves them as .t7 file.
  • Another option is to run the data.lua script and uncomment lines 41-42 with the torch.save() command.
  • You can try out the impact of the other parameters such as: learning rate, different optimization technique, etc.
  • If your dataset is unbalanced, i.e there are much more silence then activities in the speech signal, you can try to different weights on the loss functions. This can be done by changing the values of the weights parameter in loss.lua file.

Releases

No releases published

Packages

No packages published