GitHub - seasky100/tensorflow_end2end_speech_recognition: End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

TensorFlow >= 1.3.0
tqdm >= 4.14.0
python-Levenshtein >= 0.12.0
setproctitle >= 1.1.10
seaborn >= 0.7.1

Corpus

TIMIT

Phone (39, 48, 61 phones)
character

LibriSpeech

Phone (under implementation)
Character
Word

CSJ (Corpus of Spontaneous Japanese)

Phone (under implementation)
Japanese kana character (about 150 classes)
Japanese kanji characters (about 3000 classes)

These corpuses will be added in the future.

Switchboard
WSJ
AMI

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Encoder

BLSTM
LSTM
BGRU
GRU
VGG-BLSTM
VGG-LSTM
Multi-task BLSTM
- you can set another CTC layer to the aubitrary layer.
Multi-task LSTM
VGG

Connectionist Temporal Classification (CTC) [Graves+ 2006]

Greedy decoder
Beam Search decoder
Beam Search decoder w/ CharLM (under implementation)

Options

Frame-stacking [Sak+ 2015]
Multi-GPUs training (synchronous)
Splicing
Down sampling (under implementation)

Attention Mechanism

Decoder

Greedy decoder
Beam search decoder (under implementation)

Attention type

Bahdanau's content-based attention
Bahdanau's normed content-based attention (under implementation)
location-based attention
Hybrid attention
Luong's dot attention
Luong's scaled dot attention (under implementation)
Luong's general attention
Luong's concat attention
Baidu's attention (under implementation)

Options

Sharpning
Temperature regularization in the softmax layer (Output posteriors)
Joint CTC-Attention [Kim 2016]
Coverage (under implementation)

Usage

Please refer to docs in each corpuse

TIMIT
LibriSpeech
CSJ

Lisense

MIT

Contact

hiro.mhbc@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
examples		examples
experiments/librispeech		experiments/librispeech
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

seasky100/tensorflow_end2end_speech_recognition

Folders and files

Latest commit

History

Repository files navigation

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

Corpus

Model

Encoder

Connectionist Temporal Classification (CTC) [Graves+ 2006]

Options

Attention Mechanism

Decoder

Attention type

Options

Usage

Lisense

Contact

About

Resources

License

Stars

Watchers

Forks

Languages