Skip to content

Multi-modal Speech Emotion Recogniton on IEMOCAP dataset

Notifications You must be signed in to change notification settings

guomin/speech-emotion-recognition

 
 

Repository files navigation

What's this project about?

The goal if this project is to create a multi-modal Speech Emotion Recogniton system on IEMOCAP dataset.

Project outline

  • Feb 2019 - IEMOCAP dataset aquisition and parsing
  • Mar 2019 - Baseline of linguistic model
  • Apr 2019 - Baseline of acoustic model
  • May 2019 - Integration and optimization of both models
  • Jun 2019 - Integration with open-source ASR(most likely DeepSpeech)

What's IEMOCAP dataset?

IEMOCAP states for Interactive Emotional Dyadic Motion and Capture dataset. It is the most popular database used for multi-modal speech emotion recognition.

Original class distribution:

IEMOCAP database suffers from major class imbalance. To solve this problem we reduce the number of classes to 4 and merge Enthusiastic and Happiness into one class.

Final class distribution

Related works overview

References: [1] [2] [3] [4] [5] [6] [7] [8] [9]

Tested Architectures

Acoustic Architectures

Classifier Architecture Input type Accuracy [%]
Convolutional Neural Network Spectrogram 55.3
Bidirectional LSTM with self-attention LLD features 53.2

Linguistic Architectures

Classifier Architecture Input type Accuracy[%]
LSTM Transcription 58.9
Bidirectional LSTM Transcription 59.4
Bidirectional LSTM with self-attention Transcription 63.1

Ensemble Architectures

Ensemble architectures make use of the most accurate acoustic and linguistic architectures. This means that linguistic model with bidirectional LSTM with self-attention architecture and acoustic model with Convolutional architecture are being used.

Ensemble type Accuracy
Decision-level Ensemble(maximum confidence) 66.7
Decision-level Ensemble(average) 68.8
Decision-level Ensemble(weighted average) 69.0
Feature-level Ensemble 71.1

Feature-level Ensemble Architecture

Feature-level Ensemble Confusion Matrix

How to prepare IEMOCAP dataset?

How to run?

Run hyperparameter tuning

python3 -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram

Run training

python3 -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram

Run ensemble training

python3 -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch

Run evaluation

python3 -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch

How to run in docker?(CPU only)

Run hyperparameter tuning

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram

Run training

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram

Run ensemble training

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch

Run evaluation

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch

About

Multi-modal Speech Emotion Recogniton on IEMOCAP dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Dockerfile 0.3%