Skip to content

Code for "Vid2speech: Speech Reconstruction from Silent Video" ICASSP '17

Notifications You must be signed in to change notification settings

ml-lab/vid2speech

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

vid2speech

This is the code for the paper
Vid2speech: Speech Reconstruction from Silent Video
Ariel Ephrat and Shmuel Peleg
to appear at ICASSP 2017

If you find this code useful for your research, please cite

@inproceedings{ephrat2017vid2speech,
  title     = {Vid2Speech: speech reconstruction from silent video},
  author    = {Ariel Ephrat and Shmuel Peleg},
  booktitle = {2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2017},
}

Requirements

The code depends on keras, h5py, numpy, cv2, scipy and moviepy, all of which can be easily installed using pip:

pip install keras h5py numpy scipy opencv-python moviepy

Keras was used with the TensorFlow backend.

Prepare the dataset

Download one speaker's videos from the GRID Corpus, and save the videos directly in the /dataset folder.
This code has been tested on the high quality videos of speakers 2 (male) and 4 (female).

Next, strip the audio part of each video and save as the same filename with extension .mpg replaced with .wav.
The supplied strip_audio.sh script can be used (requires ffmpeg).

cd dataset
sh strip_audio.sh

Preprocess data

cd ../code
python process_data.py

Training a new model from scratch

python train.py

Training one entire GRID speaker (1000 videos) with the supplied settings takes ~12 hours on one Titan Black GPU.

Generate video samples with reconstructed audio

python gen_samples.py

Samples will appear under ../results/samples/

Use pre-trained model to predict and generate samples

Data must first be preprocessed with process_data.py.

python predict.py --weight_path <path_to_weights>
python gen_samples.py --respath '../pretrained_results'

Weights for a pre-trained model of speaker 2 are supplied in pretrained_weights/s2.hdf5.

python predict.py --weight_path '../pretrained_weights/s2.hdf5'
python gen_samples.py --respath '../pretrained_results'

Samples will appear under ../pretrained_results/samples/

Please be in touch with arielephrat@cs.huji.ac.il with any questions or bug reports. Enjoy!

About

Code for "Vid2speech: Speech Reconstruction from Silent Video" ICASSP '17

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%