Visual Speech Recognition (AdeNet)

This page provides instructions to install the necessary packages to run the experiments described in the project on Visual Speech Recognition using Deep Learning.

Installing

To run the codes, the following dependencies are required:

miniconda2
matplotlib
pydotplus
tabulate
scikit-learn 6. ipython
pillow
theano
lasagne
nolearn

It is suggested that you use miniconda to setup a virtual environment before running the codes to prevent the packages from messing up with your current python environment. Miniconda can be download from http://conda.pydata.org/miniconda.html. To install the necessary dependencies you can use the following bash script:

#!/bin/bash
./Miniconda2−latest−Linux−x86 64.sh
conda create −n ip−avsr python source activate ip−avsr
 
pip install pip install pip install pip install pip install pip install
matplotlib pydotplus tabulate scikit −learn ipython pillow
pip install −−upgrade https://github.com/Theano/Theano/archive/master.zip 
pip install −−upgrade https://github.com/Lasagne/Lasagne/archive/master.
zip
pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn
==0.7.git

which creates a virtual environment ip-avsr, activates the virtual environment and installs all the necessary python packages to this virtual environment.

Code Structure

The source codes for different datasets are separated into individual folders named based on dataset (avletters, ouluvs, cuave). All learning models can be found in the folder modelzoo and can be imported to code as a python package. Custom neural network layers can be found in the package custom_layers and the utils package contains utility functions such as plotting, drawing network layers and image preprocessing functions for normalization and computing delta coefficients.

Datasets

Within each dataset folder, the codes are further grouped into 3 folders. The data folder contains all the mouth ROIs, DCT features and Image Differences extracted for the individual dataset. The format used is MatLab’s .mat format to allow interchangeability between MatLab and python as the pretraining stage requires the use of MatLab DBN code. The model folder contains all pretrained, finetuned and trained networks so they can be easily reloaded in future without the need to retrain them from scratch. The config folder contains a list of .ini config files that are used for different models (DeltaNet, AdeNet v1, AdeNet v2). A list of options are provided below. The training programs are called unimodal.py, bimodal.py, trimodal.py for single stream, double stream and triple stream input source respectively. All training codes accepts a config file using the option --config. Type python trimodal.py -h to see usage options.

usage: trimodal.py [−h] [−−config CONFIG] [−−write results WRITERESULTS]
optional arguments:
−h, −−help show this help message and exit
−−config CONFIG config file to use, default=config/trimodal.ini
−−write results WRITE RESULTS write results to file

Config File Options

Under the data section:

images: raw image ROIs used to extract DBNFs.
dct: dct features with delta coefficients appended.
diff: diff image ROIs used for difference of image input source.

Under the models section:

pretrained: pretrained DBNF extractor DBN network for raw images.
finetuned: finetuned DBNF extractor DBN network for raw images.
pretrained diff: finetuned DBNF extractor DBN network for difference of images.
finetuned diff: finetuned DBNF extractor DBN network for difference of images.
fusiontype: the fusion method to use to combine different input sources.

Under the training section:

learning rate: learning rate to use train the model.
decay rate: learning rate decay at each epoch after decay start.
decay start: epoch to start learning rate decay
do finetune: to perform finetuning of DBNF extractor.
save finetune: save finetuned model of raw image DBNF extractor.
load finetune: load finetuned model of raw image DBNF extractor.
load finetune diff: load finetuned model of image differences DBNF ex- tractor.
output units: number of output classes.
lstm units: number of hidden units used in the LSTM classifiers.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
avletters		avletters
avletters2		avletters2
cuave		cuave
custom		custom
dbn		dbn
landmarking		landmarking
modelzoo		modelzoo
oulu		oulu
runners		runners
test		test
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avletters

avletters

avletters2

avletters2

cuave

cuave

custom

custom

dbn

dbn

landmarking

landmarking

modelzoo

modelzoo

oulu

oulu

runners

runners

test

test

utils

utils

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Visual Speech Recognition (AdeNet)

Installing

Code Structure

Datasets

Config File Options

About

Releases

Packages

Languages

behtak/ip-avsr

Folders and files

Latest commit

History

Repository files navigation

Visual Speech Recognition (AdeNet)

Installing

Code Structure

Datasets

Config File Options

About

Resources

Stars

Watchers

Forks

Languages