Neural Speech Recognition
This repository contains experiments with CNN, Deepspeech2 and RNN models with different datasets.
The follwing results have been created with the AN4 dataset:
(orange:ResNet, blue:ResNet+augmentation, dark red: Deepspeech, light blue: Deepspeech+augmentation, ligth red: EncoderDecoder, green: EncoderDecoder+augmentation, grap: EncoderDecoder+augmentation+pseudo labels)
ResNet CNN + CTC
Bleu: 70.180 WER: 16.482 CER: 9.792 ACC: 49.231
Deepspeech 2 + CTC (modified from Sean Naren's deepspeech.pytorch repository)
Bleu: 89.890 WER: 5.094 CER: 2.993 ACC: 76.923 (note that the net is overfitting)
Encoder Decoder RNN
Bleu: 83.770 WER: 7.529 CER: 5.735 ACC: 72.308