GitHub - xiaoyeye1117/multimodalSR: Multimodal speech recognition using lipreading (with CNNs) and audio (using LSTMs). Sensor fusion is done with an attention network.

This is the repository containing most of the code for my thesis 'Design, Implementation and Analysis of a Deep Convolutional-Recurrent Neural Network for Speech Recognition throuth Audiovisual Sensor Fusion' at the ESAT (Electrical Engineering) Department of KU Leuven (2016-2017).

Author: Matthijs Van keirsbilck
Supervisor: Bert Moons
Promotor: Marian Verhelst

The code and thesis text are bound by the KU Leuven's Student Thesis Copyright Regulations.

The CNN-LSTM networks for lipreading are combined with LSTM networks for audio recognition through an attention mechanism.
These networks achieve state-of-the-art phoneme recognition performance on the publicly available audio-visual dataset TCD-TIMIT. Systems that rely only audio suffer greatly when audio quality is lowered by noise, as is often the case in real-life situations.
This performance loss can be greatly mitigated by adding visual information.
The CNN-LSTM neural networks acieve 68.46% correctness compared to the 57.85% baseline.
Audio-only neural networks achieve 67.03% compared to 65.47% in the baseline.
Lipreading-audio combination networks achieve 75.70% accuracy for clean audio, and 58.55% for audio with an SNR of 0dB. The baseline multimodal network achieved 59% and 44% for clean and noisy audio, respectively.

The networks are implemented using Lasagne.
There is room for improvement of the code; I'll try to improve it if I can find the time.

For the downloading, preprocessing etc of the dataset: see https://github.com/matthijsvk/TCDTIMITprocessing
For the lipreading networks, see the folder code/lipreading
For the audio speech recognition networks, see code/audioSR
For the combination networks see code/combinedSR

Thanks to the authors of all the data and software used in this work. An inexhaustive list:

To Set up Python, I recommend using Anaconda. You can use the provided environment.yml to install all python packages (although some aren't used anymore).
For the installation of Theano/Lasagne and CUDA, I recommend following this tutorial.

If you find this thesis or code useful, please cite according to the following bib entry

@MastersThesis{Vankeirsbilck:Thesis:2017,
    author     =     {Matthijs Van keirsbilck},
    title     =     {{Design, implementation and analysis of a deep convolutional-recurrent neural network for speech recognition through audiovisual sensor fusion}},
    school     =     {KU Leuven},
    address     =     {Belgium},
    year     =     {2017},
    }

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
code		code
papers		papers
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
thesis.pdf		thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

papers

papers

results

results

.gitattributes

.gitattributes

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

thesis.pdf

thesis.pdf

Repository files navigation

About

Releases

Packages

Languages

License

xiaoyeye1117/multimodalSR

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages