Skip to content

fresty/tensorflow_end2end_speech_recognition

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorFlow Implementation of End-to-End Speech Recognition

Requirements

  • TensorFlow >= 1.2.0
  • tqdm >= 4.14.0
  • python-Levenshtein >= 0.12.0
  • setproctitle >= 1.1.10
  • seaborn >= 0.7.1

Corpus

TIMIT

  • phone-level (39, 48, 61 phones)
  • character-level

CSJ (Corpus of Spontaneous Japanese)

  • phone-level
  • Japanese kana character-level
  • Japanese grapheme-level (including kanji characters)

These corpuses will be added in the future.

This repository does'nt include pre-processing and pre-processing is based on this repo. If you want to do pre-processing, please look at this repo.

Model

Connectionist Temporal Classification (CTC) [Graves+ 2006]

  • LSTM-CTC
  • GRU-CTC
  • Bidirectional LSTM-CTC (BLSTM-CTC)
  • Bidirectional GRU-CTC (BGRU-CTC)
  • Multitask CTC (you can set another CTC layer to the aubitrary layer.)
Options
General technique
  • weight decay
  • dropout
  • gradient clipping
  • activation clipping
  • multitask learning
Awesome technique

Attention Mechanism

Encoder
  • LSTM encoder
  • BLSTM encoder
  • GRU encoder
  • BGRU encoder
Decoder
Attention type

Under implementation

Options
General technique
Awesome technique
  • temperature in the softmax layer (Compute attention weights)
  • temperature in the softmax layer (Output posteriors)

Joint CTC-Attention

Under implementation

Usage

Comming soon

Lisense

MIT

Contact

hiro.mhbc@gmail.com

About

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Shell 0.8%