Sarcasm-Detection-using-CNN

This is the PyTorch implementation of work presented in 'Modelling Context with User Embeddings for Sarcasm Detection in Social Media' (https://arxiv.org/pdf/1607.00976.pdf). The neural network takes a tweet (content) and corresponding user embedding (context) as input, and classifies the tweets as sarcastic/non-sarcastic.

System requirments

python 2.7
PyTorch 0.3.1
python package gensim
python package yandex.translate
python package ipdb

Running the code

1. Pre-requisites

Get pre-trained word embeddings (e.g. Skip-gram)
- Install the bin file from this link
- Unzip the .bin.gz fine and run the iPython notebook get_word2vec_embeddings.ipynb
- Place the .txt file obtained in DATA/embeddings/ and change its name to words.txt
Get pre-trained user embeddings for the user. The embeddings we used can be found here. Place the embeddings in DATA/embeddings and name the file as usr2vec.txt
Execute iPython notebook get_data.ipynb. This utility code is used to download tweets corresponding to the tweet ids and then preprocess these tweet messages.

2. Training and Evaluation

a. To run the original code

Run python train_CUE_CNN.py

b. To run the RNN + CNN Hybrid model on the new Dataset

Run python Headlines_RNN.py

Output, results and visualization

The code generate a progress folder, that contains sub folder for every run. Inside every run folder following two file are generated -

logs.txt which contains loss and accuracy on train/test/validation set after every epoch
stats.jpg that plots
- train/test/validation loss on a single plot
- train/test/validation accuracy on a single plot

Note:

Util files, pre-trained user embeddings and raw tweet ids were obtained from Original CUE-CNN

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
DATA		DATA
aux		aux
code		code
Headlines_CNN.py		Headlines_CNN.py
Headlines_RNN.py		Headlines_RNN.py
README.md		README.md
SemEval_CUE_CNN.py		SemEval_CUE_CNN.py
SemEval_data_set.py		SemEval_data_set.py
bamman_redux_ids.txt		bamman_redux_ids.txt
crossfolds.sh		crossfolds.sh
get_data.ipynb		get_data.ipynb
get_word2vec_embeddings.ipynb		get_word2vec_embeddings.ipynb
headline_data_set.py		headline_data_set.py
init.sh		init.sh
onefold.sh		onefold.sh
param_sweep.py		param_sweep.py
param_sweep_Headlines_RNN.py		param_sweep_Headlines_RNN.py
preprocess_bamman.sh		preprocess_bamman.sh
sarcasm-detection-using-hybrid-NN.pdf		sarcasm-detection-using-hybrid-NN.pdf
sarcasm_cnn.sh		sarcasm_cnn.sh
train_CUE_CNN.py		train_CUE_CNN.py
twitter_data_set.py		twitter_data_set.py

zxh991103/Sarcasm-Detection-using-NN

Folders and files

Latest commit

History

Repository files navigation

Sarcasm-Detection-using-CNN

System requirments

Running the code

1. Pre-requisites

2. Training and Evaluation

a. To run the original code

b. To run the RNN + CNN Hybrid model on the new Dataset

Output, results and visualization

Note:

About

Resources

Stars

Watchers

Forks

Languages