System for Irony Detection in Product Reviews

This system was used for the paper An impact analysis of features in a classification approach to irony detection in product reviews[1].

Installation

1. Download the system

Download the system with the following command

> git clone https://github.com/kbuschme/irony-detection.git

2. Download the corpus

Download the file SarcasmCorpus.rar which contains the Sarcasm Corpus by Elena Filatova[2] and place it inside the corpora directory

> curl -o corpora/SarcasmCorpus.rar http://storm.cis.fordham.edu/~filatova/SarcasmCorpus.rar

Unpack the content of the archive SarcasmCorpus.rar into a directory corpora/SarcasmCorpus

> unrar e corpora/SarcasmCorpus.rar corpora/SarcasmCorpus/

Unpack the archive Ironic.rar into a directory corpora/SarcasmCorpus/Ironic and the archive Regular.rar into a directory corpora/SarcasmCorpus/Regular

> unrar e corpora/SarcasmCorpus/Ironic.rar corpora/SarcasmCorpus/Ironic/
> unrar e corpora/SarcasmCorpus/Regular.rar corpora/SarcasmCorpus/Regular/

3. Download additional resources

Download the file opinion-lexicon-English.rar which contains the Opinion Lexicon by Hu and Liu[3] and place it inside the resources directory

> curl -o resources/opinion-lexicon-English.rar http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar

Unpack the files negative-words.txt opinion-lexicon-English.rar

> unrar e resources/opinion-lexicon-English.rar resources/

4. Install python libraries and language models

A working Python 2 installation (tested with version 2.7.5) and the following Python libraries are needed. These can be installed using pip:

NumPy

> sudo pip install numpy

SciPy

> sudo pip install scipy

scikit-learn

> sudo pip install scikit-learn

Natural Language Toolkit (NLTK)

> sudo pip install PyYAML
> sudo pip install nltk

Additionally NLTK requires the following models:

Max Entropy Pos Tagger (maxent_treebank_pos_tagger) and
Punkt Tokenizer Models (punkt)

which can be downloaded with the setup.py script

> python setup.py

or manually with the following steps:

> python
>>> import nltk
>>> nltk.download("punkt")
>>> nltk.download("maxent_treebank_pos_tagger")
>>> exit()

Getting Started

To start the system change the directory to src and run the file main.py which provides a command-line interface:

> cd src
> python main.py

the output should look as follows

> python main.py
usage: Irony Detector [-h] {corpus,feature,interactive,ml,sets} ...
Irony Detector: error: too few arguments
>

The following commands are available and described in the Manual section below:

corpus,
features,
interactive,
ml and
sets

As a first step the sets command should be run. This will create three files inside the corpora/SarcasmCorpus directory. The file shuffled_set.txt is a randomized version of the corpus used for cross-validation. The files training_set.txt and test_set.txt are a training and test set and contain 90% and 10% of the reviews, respectively.

> python main.py sets

Now the machine learning mode of the system can be used to classify reviews. The following example applies 10-fold cross-validation:

> python main.py ml cross-validation

Manual

Help:

Show a short help message about the available commands:

> python main.py -h

Show a detailed help message about the available commands:

> python main.py --help

This should look like the following message:

usage: Irony Detector [-h] {corpus,feature,interactive,ml,sets} ...

Detects irony in amazon reviews.

optional arguments:
  -h, --help            show this help message and exit

Commands:
  The following commands can be invoked.

  {corpus,feature,interactive,ml,sets}
                        Valid commands.
    corpus              Show details about the entire corpus.
    feature             Shows how often each feature is found for ironic and
                        regular reviews in the training_and_validation_set.
    interactive         The interactive mode classifies a given sentence using
                        a saved model.
    ml                  Use the machine learning approach to classify reviews.
    sets                Divide the corpus into training, validation and test
                        set.

Corpus mode:

The corpus mode shows general information about a corpus.

Show all reviews inside the corpus:

> python main.py corpus reviews

Show some statistics about the corpus:

> python main.py corpus stats

Feature mode:

The feature mode displays statistics about the specific features or exports all features as Attribute-Relation File Format (ARFF).

Show how often the specific features in all reviews:

> python main.py feature show

Export the extracted feature to an ARFF file:

> python main.py feature export

Machine learning mode:

The machine learning mode uses the following classifiers to classify the reviews:

Naive Bayes,
Decision Tree,
Random Forest,
Logistic Regression and
Support Vector Machine

Use 10-fold cross-validation:

> python main.py ml cross-validation

Train the classifiers on a training set and classify a test set:

> python main.py ml test

Set mode:

On one hand the sets mode generates a shuffled set for cross-validation and on the other hand divides all reviews into a training and test set by a 90 to 10 ratio.

> python main.py sets

This command creates the following three files inside the directory corpora/SarcasmCorpus:

corpora/SarcasmCorpus/shuffled_set.txt,
corpora/SarcasmCorpus/training_set.txt and
corpora/SarcasmCorpus/test_set.txt.

References

[1] Konstantin Buschmeier, Philipp Cimiano, and Roman Klinger. An impact analysis of features in a classification approach to irony detection in product reviews. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 42–49, Baltimore, Maryland, June 2014. Association for Computational Linguistics.

[2] Elena Filatova. 2012. Irony and sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 392–398, Istanbul, Turkey, May. European Language Resources Association (ELRA).

[3] Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference of Knowledge discovery and data mining, KDD '04, pages 168–177, New York, NY, USA. ACM.

Copyright (C) Konstantin Buschmeier.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
corpora		corpora
resources		resources
src		src
.gitignore		.gitignore
CITATION		CITATION
COPYING		COPYING
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corpora

corpora

resources

resources

src

src

.gitignore

.gitignore

CITATION

CITATION

COPYING

COPYING

README.md

README.md

setup.py

setup.py

Repository files navigation

System for Irony Detection in Product Reviews

Installation

1. Download the system

2. Download the corpus

3. Download additional resources

4. Install python libraries and language models

Getting Started

Manual

Help:

Corpus mode:

Feature mode:

Machine learning mode:

Set mode:

References

About

Releases

Packages

Languages

License

romanklinger/irony-detection

Folders and files

Latest commit

History

Repository files navigation

System for Irony Detection in Product Reviews

Installation

1. Download the system

2. Download the corpus

3. Download additional resources

4. Install python libraries and language models

Getting Started

Manual

Help:

Corpus mode:

Feature mode:

Machine learning mode:

Set mode:

References

About

Resources

License

Stars

Watchers

Forks

Languages