Skip to content

eastonYi/end-to-end_asr_pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Transformer (Pytorch)

The implementation is based on Speech Transformer: End-to-End ASR with Transformer. A PyTorch implementation of Speech Transformer network, which directly converts acoustic features to character sequence using a single nueral network. This work is mainly done in Kuaishou as an intern.

Install

  • Python3
  • PyTorch 1.5
  • Kaldi (just for feature extraction)
  • pip install -r requirements.txt

Usage

Quick start

$ cd egs/aishell
# Modify aishell data path to your path in the begining of run.sh
$ bash transofrmer.sh

That's all!

You can change parameter by $ bash transofrmer.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 3. See parameter name in egs/aishell/run.sh before . utils/parse_options.sh.

Workflow

  • Data Preparation and Feature Generation TODO: using the scripts in data_prepare

  • Network Training

  • Decoding change the transofrmer.sh

More detail

egs/aishell/run.sh provide example usage.

# Set PATH and PYTHONPATH
$ cd egs/aishell/; . ./path.sh
# Train
$ train.py -h
# Decode
$ recognize.py -h

How to resume training?

$ bash run.sh --continue_from <model-path>

Results

Model CER Config
LSTMP 9.85 4x(1024-512). See kaldi-ktnet1
Listen, Attend and Spell 13.2 See Listen-Attend-Spell's egs/aishell/run.sh
SpeechTransformer 10.7 See egs/aishell/run.sh
Model #Snt #Wrd Sub Del Ins CER
SpeechTransformer 7176 104765 9.9 0.4 0.3 10.7
Conv_CTC_Transformer 7176 104765
Conv_CTC 7176 104765
CIF 7176 104765 9.44 0.33 0.24 10.02

Acknowledgement

Reference

  • [1] Yuanyuan Zhao, Jie Li, Xiaorui Wang, and Yan Li. "The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition." ICASSP 2019.
  • [2] L. Dong and B. Xu, “CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition Linhao,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, vol. 2017-Augus, pp. 3822–3826.

About

Implements of CTC, Speech-Transformer and CIF for end-to-end speech recognition with pytorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published