Emotional neural conversational system with knowledge for online courses search.
The project is aimed to create a chat-bot that responds with a given emotion and is also capable of online courses recommendation. Emotion aspect of the system is done via emotion entity embeddings that are fed to the RNN decoder along with the encoded utterance. Course recommendations are implemented as a cosine similarity search through averaged word2vec embeddings of course description words.
python -m spacy download en_core_web_lg
wget https://github.com/facebookresearch/fastText/archive/v0.1.0.zip
unzip v0.1.0.zip
rm v0.1.0.zip
cd fastText-0.1.0
make
After you download and extract all of the following corpora run src/data/emotion_data_parsers.py
Download Hashtag Emotion Corpus and extract to data/raw/emotion/Jan9-2012-tweets-clean.txt.
Download The Emotion in Text and extract to data/raw/emotion/text_emotion.csv.
Download Affective Text and extract to data/raw/emotion/AffectiveText.Semeval.2007.
Download Electoral/Political tweets annotated for sentiment, emotion, purpose and style and extract to data/raw/emotion/ElectoralTweetsData.
Download WASSA-2017 Shared Task on Emotion Intensity (EmoInt) and extract to data/raw/emotion/Wassa-2017.
Download Collections of love letters, hate mail, and suicide notes and extract to data/raw/emotion/LoveHateSuicide/love-letters.txt.
Clone Movie reviews, annotated for emotion classification to data/raw/emotion/spudisc-emotion-classification-master
Get NRC Emotion Lexicon and extract to data/raw/emotion/NRC-Sentiment-Emotion-Lexicons
Download Cornell Movie-Dialogs Corpus and extract to data/raw/dialogue/cornell movie-dialogs corpus.
Reformat to csv via src/data/movie_corpus_extraction.py
Clone Ubuntu Dialogue Corpus v2.0
Translate create_ubuntu_dataset.py to python 3. Set positive example probability to 1. Generate corpus via generate.sh. Reformat to csv via src/data/ubuntu_corpus_extraction.py
Download Microsoft Research Social Media Conversation Corpus and extract to data/raw/dialogue/MSRSocialMediaConversationCorpus.
This dataset only has tweet IDs, so create a Twitter application to access its API. Put your ConsumerToken, ConsumerSecret, AccessToken, AccessSecret into config.ini in the following format
[twitter]
ConsumerToken = abc
ConsumerSecret = abc
AccessToken = abc
AccessSecret = abc
Run src/data/microsoft_corpus_tweets_extraction.py to extract tweet texts.
Download a month of Reddit comments and extract to data/raw/dialogue/reddit_comments_month.
Create utterances via src/data/reddit_comments_extraction.py.
Run all cells in notebooks/exploration/1.0-rsh-emotion-data.ipynb to generate a combined dataset with a reduced number of classes. Find the best hyperparameters for fasttext via src/models/fasttext_hypertuning.py. Run emotion classification training on the whole corpus with src/models/fasttext_training.py Prepare dialogue data for fasttext through src/data/prepare_for_fasttex.py. Run utterances emotion classification via src/models/fasttext_inference.py and create the final emotion dialogue dataset by running src/data/merge_with_labels.py
Tests are in src/tests.
- Roman Shaptala - Everything - LinkedIn
This project is licensed under the MIT License - see the LICENSE.md file for details
- Zhou, Hao, et al. "Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory." arXiv preprint arXiv:1704.01074 (2017). [PDF]
- Ghazvininejad, Marjan, et al. "A Knowledge-Grounded Neural Conversation Model." arXiv preprint arXiv:1702.01932 (2017). [PDF]
- pytorch-seq2seq [code]
- Angular Chatbot with Dialogflow [code]