Skip to content

thomberg1/NeuralSpeechRecognition

Repository files navigation

Neural Speech Recognition

This repository contains experiments with CNN, Deepspeech2 and RNN models with different datasets.

The follwing results have been created with the AN4 dataset:

Alt text

Alt text

(orange:ResNet, blue:ResNet+augmentation, dark red: Deepspeech, light blue: Deepspeech+augmentation, ligth red: EncoderDecoder, green: EncoderDecoder+augmentation, grap: EncoderDecoder+augmentation+pseudo labels)

ResNet CNN + CTC

Bleu: 70.180 WER: 16.482 CER: 9.792 ACC: 49.231

Deepspeech 2 + CTC (modified from Sean Naren's deepspeech.pytorch repository)

Bleu: 89.890 WER: 5.094 CER: 2.993 ACC: 76.923 (note that the net is overfitting)

Encoder Decoder RNN

Bleu: 83.770 WER: 7.529 CER: 5.735 ACC: 72.308

About

Experiments with CNN, Deepspeech2 and RNN models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published