Detection of COVID19 through voice using Neural Networks.
This work is part of a Masters Thesis submitted in partial fulfillment for the degree of Master of Science in Data Science for Worcester Polytechnic Institute
We work on audio samples collected from voca.ai and Coswara. Audio samples from both the datasets for combined and a 80-20 train-test stratified split is created. Below is the number of samples in each dataset.
Dataset | Voca.ai | Coswara | |
---|---|---|---|
Cough Samples | Covid +ve | 1950 | 105 |
Covid -ve | 39 | 1361 | |
Breath Samples | Covid +ve | - | 103 |
Covid -ve | - | 1366 | |
Alphabet Samples | Covid +ve | 29 | - |
Covid -ve | 1751 | - |
Below are the architectures tried. All the files are under networks folder.
Networks | AUC |
---|---|
Convolutional Neural Networks(convnet) | 0.56 |
Conv Auto Encoders(cae) | 0.57 |
Variational Auto Encoders(vae) | 0.65 |
Contrastive Learning methods(contrastive) | 0.63 |
Brown et al.(Vggish + SVM) | 0.61 |
-
Download and run the requirements.txt to install all the dependencies.
pip install -r requirements.txt
-
Create a config file of your own
Run data_processor.py
to generate data required for training the model. It reads the raw audio samples, splits into n
seconds and generates Mel filters, also called as Filter Banks (fbank
paramater in config file. Other available audio features are mfcc
& gaf
)
python3 covid_19/datagen/datadata_processor.py --config_file covid_19/configs/<config_filepath>
Using main.py
one can train all the architectures mentioned in the above section.
python3 main.py --config_file covid_19/configs/<config_filepath> --network convnet
python3 main.py --config_file --test_net True <config_filepath> --network convnet --datapath <data filepath>
Remember to generate mel filters from raw audio data and use the generated .npy
file for datapath parameter
- Vocal Track Length Normalisation
- Extract features using Praat and Opensmile
- Normalise audio sample based on average amplitude
- SincNet
- Graph Neural Networks