Preprocessed tweet files

Go to our github project page to download necessary files!!!

Tweet Sentiment Analysis

The competition task was to predict if a tweet message used to contain a positive :) or a negative :( smiley, by considering only the remaining text. Our team conducted comprehensive research on the proposed solutions in the relevant literature, as well as past projects and articles which tackled similar issues regarding text sentiment analysis. Full specification of our experiments, as well as results and conclusions drawn can be found in our report.

Complete project specification is available on the course's GitHub page.

Dependencies

Following dependencies are required in order to run the project:

Libraries

Anaconda3 - Download and install Anaconda with Python3
Scikit-Learn - Download scikit-learn library with conda
```
conda install scikit-learn
```
Gensim - Install Gensim library
```
conda install gensim
```

NLTK - Download all the packages of NLTK

python
>>> import nltk
>>> nltk.download()

Tensorflow - Install tensorflow library (version used 1.4.1)
```
$ pip install tensorflow
```

Files

Train tweets

Download twitter-datasets.zip containing positive and negative tweet files which are required during the model training phase. After unzipping, place the files obtained in the ./data/datasets directory.
Test tweets

Download test_data.txt containing tweets which are required for the testing of the trained model and obtaining score for submission to Kaggle. This file needs to be placed in the ./data/datasets directory.
Stanford Pretrained Glove Word Embeddings

Download Glove Pretrained Word Embeddings which are used for training advanced sentiment analysis models. After unzipping, place the file glove.twitter.27B.200d.txt in the ./data/glove directory.

Hardware requirements

at least 16 GB of RAM
a graphics card (optional for faster training involving CNNs)

Tested on Ubuntu 16.04 with Nvidia Tesla K80 GPU with 12 GB GDDR5

Kaggle competition

Public Leaderboard connected to this competition.

Our team's name is Bill Trader.

Team members:

Dino Mujkić (dinomujki)
Hrvoje Bušić (hrvojebusic)
Sebastijan Stevanović (sebastijan94)

Reproducing our best result

You can find already preprocessed tweet files test_full.csv, train_neg_full.csv.zip and train_pos_full.csv.zip in the ./data/parsed directory.

To run preprocessing again you must have Train tweets and Test tweets files in the ./data/dataset directory. Then go to folder /src and run run_preprocessing.py with argument train or test to generate requried files for running the CNN.

$ python run_preprocessing.py train or test

To reproduce our best score from Kaggle go to folder /src and run run_cnn.py with argument eval

$ python run_cnn.py eval

In data/models/1513824111 directory is stored a checkpoint for reproducing our best score so the training part will be skipped. If you want to run the training process from scratch, just pass the argument train when runnig run_cnn.py.

To run the evaluation you must have the necessary files. File glove.twitter.27B.200d.txt in the ./data/glove directory and preprocessed tweet files test_full.csv, train_neg_full.csv and train_pos_full.csv in the ./data/parsed directory.

This project is available under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
output		output
report		report
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

output

output

report

report

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Preprocessed tweet files

Tweet Sentiment Analysis

Dependencies

Libraries

Files

Hardware requirements

Kaggle competition

Reproducing our best result

About

Releases

Packages

Languages

gmurry/project2

Folders and files

Latest commit

History

Repository files navigation

Preprocessed tweet files

Tweet Sentiment Analysis

Dependencies

Libraries

Files

Hardware requirements

Kaggle competition

Reproducing our best result

About

Resources

Stars

Watchers

Forks

Languages