Skip to content

aascode/speech-emotion-recognition-1

 
 

Repository files navigation

What's this project about?

The goal if this project is to create a multi-modal Speech Emotion Recogniton system on IEMOCAP dataset.

Project outline

  • Feb 2019 - IEMOCAP dataset aquisition and parsing
  • Mar 2019 - Baseline of linguistic model
  • Apr 2019 - Baseline of acoustic model
  • May 2019 - Integration and optimiaztion of both models
  • Jun 2019 - Integration with open-source ASR(most likely DeepSpeech)

What's IEMOCAP dataset?

IEMOCAP states for Interactive Emotional Dyadic Motion and Capture dataset. It is the most popular database used for multi-modal speech emotion recognition.

Original class distribution:

IEMOCAP database suffers from major class imbalance. To solve this problem we reduce the number of classes to 4 and merge Enthusiastic and Happiness into one class.

Final class distribution

Related works overview

References: [1] [2] [3] [4] [5] [6] [7] [8] [9]

System Architecture

Results so far

Model Accuracy Unweighted Accuracy Loss
Acoustic 0.602 0.601 0.983
Linguistic 0.642 0.638 0.913
Ensemble (highest confidence) 0.699 0.704 0.827
Ensemble (average) 0.711 0.708 0.948
Ensemble (weighted average) 0.716 0.712 0.944

Confusion matrix of the best model

loss: 0.944, acc: 0.716. unweighted acc: 0.712, conf_mat: 
[[291.  60.  31.   9.]
 [ 88. 282.  17.   6.]
 [ 46.  19. 191.   2.]
 [ 61.  26.   4. 167.]]

*classes in order: [Neutral, Happiness, Sadness, Anger]
*row - correct class, column - prediction

About

Multi-modal Speech Emotion Recogniton on IEMOCAP dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%