Skip to content


Repository files navigation


This repo contains the implementation of the paper Automatic Speech Emotion Recognition using RNN with Local Attention


This code is contingent upon the following dependencies:

For realtime audio recording functionality:

The processed IEMOCAP data needs to be in a folder called IEMOCAP_PROCESSED. Download the dataset from:


This section provides detailed information about the code structure and how to train and test your models from scratch.

Code Structure

  • DATASET_FOUR_4 - Logmel Feature extracted dataset from IEMOCAP for 4 emotions
  • DATASET_ALT_LOGMEL_4 - logmel feature extracted using google audioset's logmel code for 4 emotions
  • DATASET_LOGMEL_6 - logmel extracted feature set for 6 emotions
  • DATASET_LOGMEL+ST_6 - logmel + short term features extracted using pyAudioAnalysis library for 6 emotions
  • DATASET_VGGISH_4 - VGGish embeddings extracted for 4 emotions
  • DATASET_SPECGRAM - spectrogram dataset extracted from 4 emotions
  • MODELS - directory containing trained emotion models
  • PYAUDIOANALYSIS - audio feature extraction utilities from pyaudioanalysis library
  • - code containing audio feature extraction utilities such as logmel, MFCC, short time features, etc.
  • - main code used to train emotion recognition model. Has the architecture described in the paper mention above, except the attention part. Its a time distributed dense layer, followed by Bidirectional LSTM, followed by Mean Pooling and dense layer with softmax activation for emotional class.
  • - code used to test the model on realtime audio or recorded audio samples.
  • - alternate logmel extraction code from google's audioset.
  • - deprecated. code used to test emotion recognition model using spectrogram features.
  • - deprecated. code used to test emotion recognition model using extracted VGGish embeddings.
  • - code used for parsing IEMOCAP annotations, loading audio, extracting features and saving the dataset.
  • test_model.ipynb - jupyter notebook for interactive testing of realtime application

How to test the pre-trained model

Open and change the feature extraction function to the correct type of features you want to extract under the #TODO line as one of the follows:

from audio_features import extract_logmel as extract_feat # 40 dim
from audio_features import extract_features as extract_feat # 62 dim
from audio_features import extract_mfcc as extract_feat # 12 dim

Now, we can run the code by specifying the path to the trained model, whether we want to use realtime mode or not, and optional path to a wav file if we want to test it offline:

$ python -w <MODEL_PATH> -r <1/0 or y/n> [-a <WAV_PATH>]

How to train from scratch

  1. The first step is to do feature extraction from the processed IEMOCAP dataset. To do this, first make sure that the IEMOCAP_PROCESSED directory is in the root directory of your repo.

  2. Open and import the correct feature extraction function as before, such as for logmel:

from audio_features import extract_logmel as extract_feat # 40 dim
  1. Run the data preprocessing code as follows:
$ python -p <path to IEMOCAP_PROCESSED> -o <dataset-output-folder>

The dataset will then be created in the specified output folder.

  1. Now we can start training using the file. Change training parameters if necessary inside the code in the build_model, define_callbacks or fit functions inside the emoLSTM class. Then Run:
$ python -p <DATASET_PATH>

The best model will be saved under MODELS directory. All models trained so far are in the folder ~/speech-emotion-recognition/models on AWS instance deep.


Speech Emotion Recognition






No releases published


No packages published