Speech Transformer (Pytorch)

The implementation is based on Speech Transformer: End-to-End ASR with Transformer. A PyTorch implementation of Speech Transformer network, which directly converts acoustic features to character sequence using a single nueral network. This work is mainly done in Kuaishou as an intern.

Install

Python3
PyTorch 1.5
Kaldi (just for feature extraction)
pip install -r requirements.txt

Usage

Quick start

$ cd egs/aishell
# Modify aishell data path to your path in the begining of run.sh
$ bash transofrmer.sh

That's all!

You can change parameter by $ bash transofrmer.sh --parameter_name parameter_value, egs, $ bash run.sh --stage 3. See parameter name in egs/aishell/run.sh before . utils/parse_options.sh.

Workflow

Data Preparation and Feature Generation TODO: using the scripts in data_prepare
Network Training
Decoding change the transofrmer.sh

More detail

egs/aishell/run.sh provide example usage.

# Set PATH and PYTHONPATH
$ cd egs/aishell/; . ./path.sh
# Train
$ train.py -h
# Decode
$ recognize.py -h

How to resume training?

$ bash run.sh --continue_from <model-path>

Results

Model	CER	Config
LSTMP	9.85	4x(1024-512). See kaldi-ktnet1
Listen, Attend and Spell	13.2	See Listen-Attend-Spell's egs/aishell/run.sh
SpeechTransformer	10.7	See egs/aishell/run.sh

Model	#Snt	#Wrd	Sub	Del	Ins	CER
SpeechTransformer	7176	104765	9.9	0.4	0.3	10.7
Conv_CTC_Transformer	7176	104765
Conv_CTC	7176	104765
CIF	7176	104765	9.44	0.33	0.24	10.02

Acknowledgement

The framework and speech-transofrmer baseline is based on Speech Transformer: End-to-End ASR with Transformer
src/transformer/conv_encoder.py refers to https://github.com/by2101/OpenASR.
The core implement of CIF algorithm is checked by Linhao Dong (the origin author of CIF)

Reference

[1] Yuanyuan Zhao, Jie Li, Xiaorui Wang, and Yan Li. "The SpeechTransformer for Large-scale Mandarin Chinese Speech Recognition." ICASSP 2019.
[2] L. Dong and B. Xu, “CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition Linhao,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, vol. 2017-Augus, pp. 3822–3826.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data_prepare		data_prepare
egs		egs
src		src
test		test
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_prepare

data_prepare

egs

egs

src

src

test

test

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Speech Transformer (Pytorch)

Install

Usage

Quick start

Workflow

More detail

How to resume training?

Results

Acknowledgement

Reference

About

Releases

Packages

Contributors 3

Languages

eastonYi/end-to-end_asr_pytorch

Folders and files

Latest commit

History

Repository files navigation

Speech Transformer (Pytorch)

Install

Usage

Quick start

Workflow

More detail

How to resume training?

Results

Acknowledgement

Reference

About

Resources

Stars

Watchers

Forks

Languages