PyTorch E2E ASR for open_stt dataset

Minimal set of scripts for training language and acoustic models for the speech recognition task. Training pipeline includes the following stages:

Character-based RNN language model
CNN-RNN acoustic model with CTC loss
Character-based RNN language model and CNN-RNN acoustic model with RNN-T loss
Fine-tuning with Reinforcement Learning and RNN-T loss

Results

The following table shows the results for Russian Open Speech To Text (STT/ASR) Dataset.

Stage	Model	Loss	Updates	CER	WER
1	LM	CE	2407000
2	AM	CTC	216850	19.9	57.0
3	LM+AM	RNN-T	108425	21.7	45.6
4	LM+AM	RL	300	19.2	43.9

Requirements

PyTorch >= 1.3 (with bug fix #27460)
torch-edit-distance
warp-rnnt

Preprocessing

Acoustic models based on the log mel filterbanks with 40 filters of size 25ms, strided by 10ms.

features.py - extract features of utterances listed in manifest file

Language model is character-based and not case sensitive.

utterances.py - extract transcriptions of precomputed utterances

Google Cloud Storage

Pre-processed datasets:

ru_open_stt_wav

Pre-trained models:

ru_open_stt_models

Kaggle Kernels

There are outdated kernels with small training subsets:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
data.py		data.py
features.py		features.py
model.py		model.py
train_ctc.py		train_ctc.py
train_lm.py		train_lm.py
train_rl.py		train_rl.py
train_rnnt.py		train_rnnt.py
utils.py		utils.py
utterances.py		utterances.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

data.py

data.py

features.py

features.py

model.py

model.py

train_ctc.py

train_ctc.py

train_lm.py

train_lm.py

train_rl.py

train_rl.py

train_rnnt.py

train_rnnt.py

utils.py

utils.py

utterances.py

utterances.py

Repository files navigation

PyTorch E2E ASR for open_stt dataset

Results

Requirements

Preprocessing

Google Cloud Storage

Kaggle Kernels

About

Releases

Packages

Languages

License

songtaoshi/open_stt_e2e

Folders and files

Latest commit

History

Repository files navigation

PyTorch E2E ASR for open_stt dataset

Results

Requirements

Preprocessing

Google Cloud Storage

Kaggle Kernels

About

Resources

License

Stars

Watchers

Forks

Languages