Speech Recognition

This project aims to do the first step towards building a simple speech detector using Speech Commands Dataset released by TensorFlow.

Dataset

The dataset used in this project is the Speech Commands Dataset by TensorFlow. The dataset contains 65,000 one-second long utterances of 30 short words and a separate folder with backgound noise audio clips. These clips were broken into one-second chunks and separated into a folder named 'silence', for classification.

Features

Mel-Frequency Cepstral Coefficients (MFCC), Chroma and Contrast features were extracted and used as features.

Preprocessing

The features were normalized and PCA was performed on them to reduce dimension.

Learning Models

Deep Neural Network

A 3 layer deep neural network was implemented in TensorFlow to do the classification. After experimenting with a few combinations I chose the hidden layer units as 700, 700 and 100. The output of each hidden layer passes through a RELU activation. Score - 0.63

Random Forest Ensemble

A Random Forest model was fit with the default parameters. Score - 0.64

K Nearest Neighbor

kNN was used to fit the data with the default parameters. Score - 0.65

Conclusion

The result obtained is not the best but is in the top 600 results obtained by other competitors in Kaggle.

I attribute this to the fact that the audio clips with background noise need more processing before their features should be extracted. Also, the words that are classified as 'unknown' are not similar to each other but are similar to the other classes of words.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Log Spectrogram.JPG		Log Spectrogram.JPG
README.md		README.md
audio_segment.py		audio_segment.py
classifiers.ipynb		classifiers.ipynb
data_analysis.ipynb		data_analysis.ipynb
data_setting_up.py		data_setting_up.py
deep_neural_net_feat.ipynb		deep_neural_net_feat.ipynb
features_test.py		features_test.py
features_train.py		features_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log Spectrogram.JPG

Log Spectrogram.JPG

README.md

README.md

audio_segment.py

audio_segment.py

classifiers.ipynb

classifiers.ipynb

data_analysis.ipynb

data_analysis.ipynb

data_setting_up.py

data_setting_up.py

deep_neural_net_feat.ipynb

deep_neural_net_feat.ipynb

features_test.py

features_test.py

features_train.py

features_train.py

Repository files navigation

Speech Recognition

Dataset

Features

Preprocessing

Learning Models

Deep Neural Network

Random Forest Ensemble

K Nearest Neighbor

Conclusion

About

Releases

Packages

Languages

pranka02/speech_recognition

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition

Dataset

Features

Preprocessing

Learning Models

Deep Neural Network

Random Forest Ensemble

K Nearest Neighbor

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages