02456-Project: Urban Sound Classification using Convolutional Neural Networks

This project treats the topic of sound event classification using Convolutional Neural Networks, analyzing and trying to improve the architecture proposed by K. J. Piczak. The network consists of 2 convolutional layers with max-pooling followed by two fully connected layers, and it is trained using log powered mel-spectrograms and their delta features. Moreover, the architecture is modified to perform multilabel sound classification, such as the classification of two different simultaneous sound events.

The dataset used to evaluate the architecture is UrbanSound8K. This dataset has also been used to create two new synthetic datasets (by overlaying two sound events) to test the performances of the multilabel classifier.

The implementation part has been done with Keras using a Tensorflow backend. The obtained architecture performs similar to the original one with respect to the single-labelled sounds. In the multilabel case, the classifier is able to classify both sounds in 14% of the cases, while one of the two sounds is recognized in most of the cases. We also show that pretraining the multilabel classifier with single-labelled sounds from the original dataset (and successively training it with overlaying sounds) helps in achieving a better accuracy, especially when the sound segment to be classified is composed by a single event.

Organization of the repository:

Main files:

singlelabel_classification.py: main file to run for performing single label classification
multilabel_classification.py: main file to run for performing multilabel classification
keras_models.py: it contains the models implemented and used during the project
preprocessor.py: it contains the methods for preprocessing the sound clips
extract_features.py: script used for extracting the features from the audio clips

Supplementary files:

boxplot.py: file for creating a boxplot from an excel-file
load_plot_cm.py: file for loading and plotting a confusion matrix
draw_convnet.py: script used for drawing the cnn architecture

Overlaying dataset creation:

The folder called "Overlaying_datasets_creation_scripts" contains the scripts necessary to create two datasets containing overlaying sound segments, starting from UrbanSound8K. More explanation of this can be found in the Python notebook referenced below.

Python Notebook:

The repository contains also a Python notebook that can be seen here The notebook explains the major steps needed in order to reproduce the obtained results, along with some code snippets.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
AudioSet		AudioSet
Overlaying_datasets_creation_scripts		Overlaying_datasets_creation_scripts
TensorBoard		TensorBoard
figures		figures
logs		logs
notebook_images		notebook_images
.gitignore		.gitignore
Notebook.ipynb		Notebook.ipynb
README.md		README.md
boxplot.py		boxplot.py
confusion_matrix.py		confusion_matrix.py
draw_convnet.py		draw_convnet.py
extract_features.py		extract_features.py
keras_models.py		keras_models.py
load_plot_cm.py		load_plot_cm.py
multilabel_classification.py		multilabel_classification.py
plot_specgrams.py		plot_specgrams.py
preprocessor.py		preprocessor.py
singlelabel_classification.py		singlelabel_classification.py
tensorflow_models.py		tensorflow_models.py
train_models_overlay.py		train_models_overlay.py
utils.py		utils.py

daler3/02456-Project---Background-Audio-Classification

Folders and files

Latest commit

History

Repository files navigation

02456-Project: Urban Sound Classification using Convolutional Neural Networks

Organization of the repository:

Main files:

Supplementary files:

Overlaying dataset creation:

Python Notebook:

About

Resources

Stars

Watchers

Forks

Languages