EmoCourseChat

Emotional neural conversational system with knowledge for online courses search.

The project is aimed to create a chat-bot that responds with a given emotion and is also capable of online courses recommendation. Emotion aspect of the system is done via emotion entity embeddings that are fed to the RNN decoder along with the encoded utterance. Course recommendations are implemented as a cosine similarity search through averaged word2vec embeddings of course description words.

Getting Started

Spacy's large English model

python -m spacy download en_core_web_lg

Fasttext

wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
unzip v0.1.0.zip
rm v0.1.0.zip
cd fastText-0.1.0
make

Data

Emotion

After you download and extract all of the following corpora run src/data/emotion_data_parsers.py

Hashtag Emotion Corpus

Download Hashtag Emotion Corpus and extract to data/raw/emotion/Jan9-2012-tweets-clean.txt.

Crowdflower's The Emotion in Text

Download The Emotion in Text and extract to data/raw/emotion/text_emotion.csv.

Affective Text SemEval 2007

Download Affective Text and extract to data/raw/emotion/AffectiveText.Semeval.2007.

Electoral/Political tweets annotated for sentiment, emotion, purpose and style

Download Electoral/Political tweets annotated for sentiment, emotion, purpose and style and extract to data/raw/emotion/ElectoralTweetsData.

WASSA-2017 Shared Task on Emotion Intensity (EmoInt)

Download WASSA-2017 Shared Task on Emotion Intensity (EmoInt) and extract to data/raw/emotion/Wassa-2017.

Collections of love letters, hate mail, and suicide notes

Download Collections of love letters, hate mail, and suicide notes and extract to data/raw/emotion/LoveHateSuicide/love-letters.txt.

Movie reviews, annotated for emotion classification

Clone Movie reviews, annotated for emotion classification to data/raw/emotion/spudisc-emotion-classification-master

NRC Emotion Lexicon

Get NRC Emotion Lexicon and extract to data/raw/emotion/NRC-Sentiment-Emotion-Lexicons

Dialogue

Cornell Movie-Dialogs Corpus

Download Cornell Movie-Dialogs Corpus and extract to data/raw/dialogue/cornell movie-dialogs corpus.

Reformat to csv via src/data/movie_corpus_extraction.py

Ubuntu Dialogue Corpus v2.0

Clone Ubuntu Dialogue Corpus v2.0

Translate create_ubuntu_dataset.py to python 3. Set positive example probability to 1. Generate corpus via generate.sh. Reformat to csv via src/data/ubuntu_corpus_extraction.py

Microsoft Research Social Media Conversation Corpus

Download Microsoft Research Social Media Conversation Corpus and extract to data/raw/dialogue/MSRSocialMediaConversationCorpus.

This dataset only has tweet IDs, so create a Twitter application to access its API. Put your ConsumerToken, ConsumerSecret, AccessToken, AccessSecret into config.ini in the following format

[twitter]
ConsumerToken = abc
ConsumerSecret = abc
AccessToken = abc
AccessSecret = abc

Run src/data/microsoft_corpus_tweets_extraction.py to extract tweet texts.

Reddit comments

Download a month of Reddit comments and extract to data/raw/dialogue/reddit_comments_month.

Create utterances via src/data/reddit_comments_extraction.py.

Installation

Run all cells in notebooks/exploration/1.0-rsh-emotion-data.ipynb to generate a combined dataset with a reduced number of classes. Find the best hyperparameters for fasttext via src/models/fasttext_hypertuning.py. Run emotion classification training on the whole corpus with src/models/fasttext_training.py Prepare dialogue data for fasttext through src/data/prepare_for_fasttex.py. Run utterances emotion classification via src/models/fasttext_inference.py and create the final emotion dialogue dataset by running src/data/merge_with_labels.py

Running the tests

Tests are in src/tests.

Authors

Roman Shaptala - Everything - LinkedIn

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Zhou, Hao, et al. "Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory." arXiv preprint arXiv:1704.01074 (2017). [PDF]
Ghazvininejad, Marjan, et al. "A Knowledge-Grounded Neural Conversation Model." arXiv preprint arXiv:1702.01932 (2017). [PDF]
pytorch-seq2seq [code]
Angular Chatbot with Dialogflow [code]

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
notebooks/exploration		notebooks/exploration
reports		reports
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

License

Neronuser/EmoCourseChat

Folders and files

Latest commit

History

Repository files navigation

EmoCourseChat

Getting Started

Spacy's large English model

Fasttext

Data

Emotion

Hashtag Emotion Corpus

Crowdflower's The Emotion in Text

Affective Text SemEval 2007

Electoral/Political tweets annotated for sentiment, emotion, purpose and style

WASSA-2017 Shared Task on Emotion Intensity (EmoInt)

Collections of love letters, hate mail, and suicide notes

Movie reviews, annotated for emotion classification

NRC Emotion Lexicon

Dialogue

Cornell Movie-Dialogs Corpus

Ubuntu Dialogue Corpus v2.0

Microsoft Research Social Media Conversation Corpus

Reddit comments

Installation

Running the tests

Authors

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Languages