Python toolkit for Visual Speech Recognition
pyVSR is a Python toolkit aimed at running Visual Speech Recognition (VSR) experiments in a traditional framework (e.g. handcrafted visual features, Hidden Markov Models for pattern recognition).
The main goal of pyVSR is to easily reproduce VSR experiments in order to have a baseline result on most publicly available audio-visual datasets.
-
currently supported:
-
TCD-TIMIT
- speaker-dependent protocol (Gillen)
- speaker-independent protocol (Gillen)
- single person
-
OuluVS2
- speaker-independent protocol (Saitoh)
- single person
-
-
Discrete Cosine Transform (DCT)
- Automatic ROI extraction (grayscale, RGB, DCT)
- Configurable window size
- Fourth order accurate derivatives
- Sample rate interpolation
- Storage in HDF5 format
-
Active Appearance Models (AAM)
- Do NOT require manually annotated landmarks
- Face, lips, and chin models supported
- Parameters obtainable either through fitting or projection
- Implementation based on Menpo
-
Point cloud of facial landmarks
- OpenFace wrapper
3. Train Hidden Markov Models (HMMs)
- easy HTK wrapper for Python
- optional bigram language model
- multi-threaded support (both for training and decoding at full CPU Power)
- pyVSR has a simple, modular, object-oriented architecture
from pyVSR import tcdtimit
dataset_dir = '/path/to/dataset/tcdtimit/'
train, test = tcdtimit.files.request_files(
dataset_dir=dataset_dir,
protocol='single_volunteer',
speaker_id='24M')
First store the full DCT sequences:
import pyVSR
experiment = pyVSR.AVSR(num_threads=4)
experiment.extract_save_features(
files=train+test,
feature_type='dct',
extract_opts={
'roi_extraction': 'dct',
'need_coords': True,
'boundary_proportion': 0.7,
'video_backend': 'menpo',
'roi_dir': './run/features/roi/',
'window_size': (36, 36)
},
output_dir='./run/features/dct/'
)
Then post-process the DCT coefficients and write .htk binary files:
features_train = files_to_features(train, extension='.h5')
features_test = files_to_features(test, extension='.h5')
experiment.process_features_write_htk(
files=features_train + features_test,
feature_dir='./run/features/dct/',
feature_type='dct',
process_opts={
'mask': '1-44',
'keep_basis': True,
'delta': True,
'double_delta': True,
'deriv_order': 'fourth',
'interp_factor': 2,
'interp_type': 'cubic'},
frame_rate=30,
out_dir='./run/features/htk_dct/')
import pyVSR
experiment = pyVSR.AVSR(num_threads=2)
experiment.extract_save_features(
files=train + test,
feature_type='landmarks',
extract_opts=None,
output_dir='./run/features/facial_landmarks/'
)
experiment.extract_save_features(
files=train[::14],
feature_type='aam',
extract_opts={
'warp':'patch',
'resolution_scales': (0.25, 0.5, 1.0),
'patch_shape':((5,5), (10,10), (17,17)),
'max_shape_components':20,
'max_appearance_components': 150,
'diagonal': 150,
'features': 'no_op',
'landmark_dir': './run/features/facial_landmarks/',
'landmark_group': 'pts_face',
'confidence_thresh':0.94,
'kept_frames': 0.03,
'greyscale':False,
'model_name': 'face_hnop_34M.pkl'},
output_dir='./run/features/aam/'
)
experiment.process_features_write_htk(
files=test,
feature_dir='./pyVSR/pretrained/',
feature_type='aam',
process_opts={
'face_detector': 'dlib',
'landmark_fitter': 'aam',
'aam_fitter': './run/features/aam/face_hnop_34M.pkl',
'parameters_from': 'lk_fitting',
'projection_aam': None,
'shape': 'face',
'part_aam': None,
'confidence_thresh': 0.84,
'shape_components': [10, 15, 20],
'appearance_components': [20, 30, 150],
'max_iters': [10, 10, 5],
'landmark_dir': './run/features/facial_landmarks/',
'log_errors': True,
'log_title': '34M/log_demo'},
out_dir='./run/features/htk_aam/'
)
train_feat = pyVSR.utils.files_to_features(train, extension='.htk')
test_feat = pyVSR.utils.files_to_features(test, extension='.htk')
pyVSR.run(
train_files=train_feat,
test_files=test_feat,
feature_dir='./run/features/htk_dct/',
hmm_states=3,
mixtures=(2, 3, 5, 7, 9, 11, 14, 17, 20),
language_model=False,
config_dir='./pyVSR/tcdtimit/htkconfigs/',
report_results=('train', 'test'),
experiment_name='dct_24M'
)
The recommended way is to create an empty conda
environment and install the following dependencies:
- conda install -c menpo menpo menpofit menpodetect menpowidgets
- conda install -c menpo pango harfbuzz
- conda install h5py
- conda install natsort
- conda install scipy
Alternatively, you can use the environment.yml
file:
- conda env create -f environment.yml
It is the user's responsibility to compile OpenFace
and HTK
.
Please refer to the documentation upstream:
OpenFace
HTK 3.5
Add the HTK binaries to the system path (e.g. /usr/local/bin/
) or to ./pyVSR/bins/htk/
Add the OpenFace binaries to ./pyVSR/bins/openface/
pyVSR was initially developed on a system running Manjaro Linux, frequently updated from the testing
repositories.
We will be testing the code soon on other platforms.
If you use this work, please cite it as:
George Sterpu and Naomi Harte. Towards lipreading sentences using active appearance models. In AVSP, Stockholm, Sweden, August 2017.
We are always happy to hear from you:
George Sterpu sterpug [at] tcd.ie
Naomi Harte nharte [at] tcd.ie