VOICE SENTIMENT ANALYZER

Voice Sentiment Analyzer is a Model created for the purposes of analysis of real time audio or recorded audio for many service based companies which give insights of feedback and queries along with sentiment directly to them without any manual work on each call received from the customers. This help many organisations to improve their service providing systems efficiently and effectively.

ARCHITECTURE

Above Image is the Voice Analysis Pipeline. Let’s Understand every phase of it’s working:

Data Collection: Using the Flask API, we will collect the audio in the form of Real time Audio or Recording through Mic or By Uploading the File to the server or by Connecting Institutional or Organisation Database with it.
Preprocessing: Preprocessing of Audio require when audio is being uploaded or database connect to them. Preprocessing includes Speaker Diarization Process which through some clustering mechanisms separate out different speaker voices into different audio files. After Diarization process, there will be Voice Activity Detection, which separate speech and silence into different audio chunks for fast working of audio analysis.
Model Building: Model Building Process in our Product makes it unique among its competitors, Our Model comprises of two Models:
- Speech to Text and Analysis It converts incoming speech parts into Text using Google Speech-to-Text API. After conversion, sentiment of text is obtained from pretrained model and will be classified into different sentiment as Positive, Neutral and Negative .
- Speech Emotion Analyzer Using Pretrained Emotion Analyzer, we classify emotion of audio broadly into three categories as: **Happy, Neutral ** and Angry emotions .
Result Generation: We will combine result obtained from two models into one to get our final result. We will visualize these results into various Graphs as Shown below:

RESEARCH

Now we are going to discuss how we reach to our final approach of selection of Model and creation of pipeline along with problems faced and their respective solutions.

Speaker Diarization : Speaker Diarization is the process in which we apply different clustering techniques to the features of the audio to seperate speakers present, to have a detailed indivisual sentiment also with overall audio sentiment.

Our first approach goes to use Google supervised diarization algorithm for clustering mechanism known as UIS-RNN ( Unbounded Interleaved-State Recurrent Neural Network ) and combine with VGG-16 voice feature extraction but having a less accuracy having a overlapping problem.

Currently we are using PyAudioAnalysis which uses K-Means and SVM to segregate speaker voices from audio file where number of speakers may be defined by user or it uses elbow method to determine appropriate number of clusters.

In our further enhancements, we will be using RPNSD (Regional Proposed Network Speaker Diarization) , most accurate and resolved overlapped problem.
Voice Activity Detection : It is a technique which classify different parts of audio file into speech or silence categories. This helps audio into different speech chunks after removal of silence from audio which helps effecient working of Model.
We have used WebRTC Voice Activity Detection algorithm to classify the segments of audio into speech and silence .
Speech Recognition: It is the process through which we convert speech into text in specific Languages: Currently we are converting them into 'English-Indian' and 'English-US'.

We first approach to Mozilla's Open Source DeepSpeech Recognition Model trained on American English and having a WER ( Word Error Rate ) of 5.83%. But due to its memory effeciency is low as it's file size is about 1.2 Gb , making memory insufficient on Cloud services.

Then our Final Approach goes to Google Speech to Text API trained and provide Services on vast group of Languages with WER ( Word Error Rate ) of about 3.44%.It is Highly effecient in terms of size and performance
Emotion Recognition: We have a objective to add this to our model for increasng accuracy, because the tone with text decides the overall sentiment, as we have done with our speech recognition part, we now move on to emotion recognition part.

We have trained our own model on Various datasets available for Public use and got an accuracy of 88.14% which classify the audio files into 3 categories: Angry, Neutral and Happy.
Text Sentiment Analysis: Text sentiment is the process in which the analysis of the text is done whether the text has Positive , Negative or Neutral impact.

Our first approach is VaderSentiment, which uses Dictionary Based Approach for sentiment. In a simplicity manner, we can understand it as like it has a million of words in its dictionary and on the basis of that it decides the sentiment of it.

Then we move on to another advanced approach: Embedding Based Approach, means that a meaning of word changes with the context in which it is used. It depends on the neighbourhood words , for which we use Embeddings, which means an N-Dimensional space vector where words are present whose similarities between each word derives from the Cosine Similarity or Euclidean Distance . Models which are based on this approach are Flair and FastText ( by FaceBook ).

This type of systems are mostly organisation based , means in some orgaisation consider some kind of words to be negative like in Banks such as Fraud. So here we need combination of Both Dictionary and Embedding Based Approach .

USAGE

Recording

1. Press the Record Button.

2. It will take you to another webpage , where it will be asking for microphone access of your browser and it will start recording and you can see live visualization graph .

3.After you have done recording, Press Stop Button , it will take some time to buffer all your audio and give a static graph, which is automatically saved to your history.

Uploading a File

Click on Upload Button on Main page.
Then upload the file using Browse button from your device, and then enter number of speakers present in the audio ( Optional ) and then click on Upload.
After Successfull upload it will pop up a message and an Analyze Button. Click on it.
After clicking on Analyze button, it will take you the next window , where it will show up a message of processing audio in Backend.
After successfull processing, it will pop up a message for further instructions to use visualizations and type Speaker number according to it and Click Show button.

FURTHER ENHANCEMENT

Fine Tuning of Models
Connection of Database of Organisation
More Generic Application
More Features: Query Extractor, Voice verification

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
audio_chunks		audio_chunks
data/models		data/models
features		features
grid		grid
static		static
templates		templates
uploads		uploads
.gitignore		.gitignore
Aptfile		Aptfile
Dockerfile		Dockerfile
MidTermFeatures.py		MidTermFeatures.py
Procfile		Procfile
README.md		README.md
ShortTermFeatures.py		ShortTermFeatures.py
Speaker_Diarization.txt		Speaker_Diarization.txt
Untitled.ipynb		Untitled.ipynb
Voice Sentiment Analzer.md		Voice Sentiment Analzer.md
analyzeMovieSound.py		analyzeMovieSound.py
ani.py		ani.py
app.py		app.py
app_namespace.py		app_namespace.py
audacityAnnotation2WAVs.py		audacityAnnotation2WAVs.py
audio.hdf5		audio.hdf5
audioAnalysis.py		audioAnalysis.py
audioAnalysisRecordAlsa.py		audioAnalysisRecordAlsa.py
audioBasicIO.py		audioBasicIO.py
audioSegmentation.py		audioSegmentation.py
audioTrainTest.py		audioTrainTest.py
audioVisualization.py		audioVisualization.py
compare_spectrograms.py		compare_spectrograms.py
convertToWav.py		convertToWav.py
create_csv.py		create_csv.py
data_extractor.py		data_extractor.py
emotion_recognition.py		emotion_recognition.py
filter.py		filter.py
hmmlearn-0.2.3-cp36-cp36m-win_amd64.whl		hmmlearn-0.2.3-cp36-cp36m-win_amd64.whl
model.pkl		model.pkl
model_angry.pkl		model_angry.pkl
recognized.txt		recognized.txt
requirements.txt		requirements.txt
sessions.py		sessions.py
speakerDiarization.py		speakerDiarization.py
test_emodb.csv		test_emodb.csv
test_tess_ravdess.csv		test_tess_ravdess.csv
train_emodb.csv		train_emodb.csv
train_tess_ravdess.csv		train_tess_ravdess.csv
utilities.py		utilities.py
utils.py		utils.py

PhaniVaddadi/voicerecog

Folders and files

Latest commit

History

Repository files navigation

VOICE SENTIMENT ANALYZER

CONTENTS

ARCHITECTURE

RESEARCH

USAGE

Recording

Uploading a File

FURTHER ENHANCEMENT

About

Resources

Stars

Watchers

Forks

Languages