Sarcasm-Detection

This repo contains all the experiments conducted for sarcasm detection as published in "Sarcasm Detection Using Multi-Head Attention Based Bidirectional LSTM"

MHA-BiLSTM

Data Preprocessing

For processing the raw text and converting them to a sequence of numbers where each number is the index number of the word embedding metrix which contains the word embeddings.

Step1: Parameters

dataDistributionType = "Balanced" or"Unbalanced"
SheetName = "balanced" or "Unbalanced"

Step2: Model training

Run model.py for model execution and HTML page generation based on the attention scores of the sentence

Step3: Ensemble Model training

Run ensemble_result.py for results of the ensemble model of SVM and MHA-BiLSTM.

SVM Model

Getting Started

Lieb Feature (L)
Gonzalez Features (G)
Bush Features (B)
Joshi Congruity Features (J)
Variable Window for Word-Embedding Similarity
Character Embedding generator (Need to add citations)

Prerequisites

sklearn, scipy, gensim

Installation

Install the packages mentioned above and python 3.6 has been used for all the experiments.

Feature Generation

Note:- All data is stores as scipy sparse matrices (To enable faster processing and lesser RAM load)

Each subfolder contains a script for each feature generation. The feature generation steps are explained in respective papers and also we have looked at AdityaJoshi github repository for reference in terms of implementation. https://github.com/adityajo/ComputationalSarcasm

All the paper named folders generate the respective features.

./wembedding This folder contains the script for generating word-embedding based similarity features. You can load the .bin or .txt pre trained vectors like Glove or Word2vec using gensim.

Add the train dataset and test dataset path and the required output path.

The dataset SHOULD be in the following format and in .tsv -- MANDATORY for the script to work. text1label1 text2label2 ...

After features are generated successfully you can load them and start training.

Training

Note:- Do mention all paths correctly, there are many and it can get confusing, so please name files appropriately when storing outputs.

There are a total of 6 scripts each for a set of experiments.

final_1 : Training script for experiment including only L features.
final_2 : Training script for experiment including only G features.
final_3 : Training script for experiment including only B features.
final_4 : Training script for experiment including only J features.
b_j_features: Training script for experiment including only B+J features.
all_features: Training script for experiment including only L+G+B+J features.

These scripts optionally can include word embedding similarity based features. Just uncomment or load the features generated and add it when the features are added using scipy.spare hstack function.

The output stats include Precision, Recall and F-Score values in the specified path.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
MHA-BiLSTM		MHA-BiLSTM
SVM		SVM
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MHA-BiLSTM

MHA-BiLSTM

SVM

SVM

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Sarcasm-Detection

MHA-BiLSTM

Data Preprocessing

Step1: Parameters

Step2: Model training

Step3: Ensemble Model training

SVM Model

Getting Started

Prerequisites

Installation

Feature Generation

Training

About

Releases

Packages

Languages

aditya-srikanth/Sarcasm-Detection

Folders and files

Latest commit

History

Repository files navigation

Sarcasm-Detection

MHA-BiLSTM

Data Preprocessing

Step1: Parameters

Step2: Model training

Step3: Ensemble Model training

SVM Model

Getting Started

Prerequisites

Installation

Feature Generation

Training

About

Resources

Stars

Watchers

Forks

Languages