This repository contains codes of Project 1 for Intelligent speech recognition. This project aims to detect the voice endpoints in audio, which consists of two tasks: the first one based on simple linear classifier and short-term signal features, while the second one is based on statistical model classifier and frequency domain features.
- python 3.6+
- scikit-learn == 0.24.2
- tqdm
- Note that sklearn only supports cpu, so it may takes a long time for training. You can reduce the number of features or samples to make it faster.
# Start training with:
python train.py --task 1
# Some flags are supported for convinient customizing your strategy, you can check them in train.py.
# For example:
python train.py --task 1 --f_size 0.064 --f_shift 0.032 --exp svm_not_normalized --model svm
# By default, the results will be saved in './task1/task1_prediction_on_test'. Please refer to the code for all flags.
# Example:
python inference.py --model {model name} --f_size 0.064 --f_shift 0.032
- To use existed model to validate, you can run:
python validate.py --model {model name} --f_size 0.064 --f_shift 0.032
- Features visualization:
- Just add the task 2 options.
# Start training with:
python train.py --task 2
# Task 2 only supports GMM currently. You can customize the number of components.
# For example:
python train.py --task 2 --f_size 0.032 --f_shift 0.008 --exp gmm --n_cpnt 10
# By default, the results will be saved in './task1/task1_prediction_on_test'. Please refer to the code for all flags.
# Example:
python inference.py --task 2 --model {model name} --f_size 0.064 --f_shift 0.032
- To use existed model to validate, you can run:
python validate.py --task 2 --model {model name} --f_size 0.064 --f_shift 0.032