This is the PyTorch implementation of work presented in 'Modelling Context with User Embeddings for Sarcasm Detection in Social Media' (https://arxiv.org/pdf/1607.00976.pdf). The neural network takes a tweet (content) and corresponding user embedding (context) as input, and classifies the tweets as sarcastic/non-sarcastic.
- python 2.7
- PyTorch 0.3.1
- python package gensim
- python package yandex.translate
- python package ipdb
-
Get pre-trained word embeddings (e.g. Skip-gram)
- Install the bin file from this link
- Unzip the .bin.gz fine and run the iPython notebook
get_word2vec_embeddings.ipynb
- Place the .txt file obtained in
DATA/embeddings/
and change its name towords.txt
-
Get pre-trained user embeddings for the user. The embeddings we used can be found here. Place the embeddings in
DATA/embeddings
and name the file asusr2vec.txt
-
Execute iPython notebook
get_data.ipynb
. This utility code is used to download tweets corresponding to the tweet ids and then preprocess these tweet messages.
Run python train_CUE_CNN.py
Run python Headlines_RNN.py
The code generate a progress
folder, that contains sub folder for every run. Inside every run folder following two file are generated -
logs.txt
which contains loss and accuracy on train/test/validation set after every epochstats.jpg
that plots- train/test/validation loss on a single plot
- train/test/validation accuracy on a single plot
Util files, pre-trained user embeddings and raw tweet ids were obtained from Original CUE-CNN