Tweet Classifier

Motivation

Train a classifier to recommend relevant tweets.

Approach

character-level neural network with bi-GRU architecture based on Tweet2Vec implementation

Run

Short test: THEANO_FLAGS='floatX=float32' python train.py
Train: THEANO_FLAGS='floatX=float32' python run.py

Requirements

pymongo

Datasets

tweets with the Computer Science conference hashtags
ArchiveTeam JSON Download of Twitter Stream 2017-02

Evaluation results

It is important to provide input samples equally balanced between all the classes otherwised the results are skewed towards the most frequent classes.

CS topics

Dataset: 5 classes * 2,000 tweets each = 10,000 tweets in total Random guess: 0.2 uniform probability distribution Model: 0.234375

ML vs NLP ?

Dataset: 2 classes * 5,000 tweets each = 10,000 tweets in total Random guess: 0.5 uniform probability distribution Model:

CS vs random tweets

Dataset: 2 classes * 30,000 tweets each = 60,000 tweets in total Random guess: 0.5 uniform probability distribution

characters: 128

Run 1

split: 0.8 x 0.1 x 0.1

learning rate: 0.3

Epoch 12 Training Cost 0.00590342712402 Validation Precision 0.776462743791 Regularization Cost 3.82598352432 Max Precision 0.859375

Test: 0.6875

Run 2

split: 0.6 x 0.2 x 0.2

learning rate: 0.3

Epoch 11 Training Cost 0.00385822719998 Validation Precision 0.742083333333 Regularization Cost 3.74113941193 Max Precision 0.8125

Test: 0.8125

Run 3

split: 0.6 x 0.2 x 0.2

learning rate: 0.1

References

Tweet2Vec

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
model		model
README.md		README.md
evaluate.py		evaluate.py
inference.py		inference.py
load_from_mongo.py		load_from_mongo.py
requirements.txt		requirements.txt
run.py		run.py
settings.py		settings.py
train.py		train.py
tweet2vec.py		tweet2vec.py
twitter_bot.py		twitter_bot.py
twitter_bot_yoan.py		twitter_bot_yoan.py
twitter_bot_yoan_new.py		twitter_bot_yoan_new.py

svakulenk0/TweetsClassifier

Folders and files

Latest commit

History

Repository files navigation

Tweet Classifier

Motivation

Approach

Run

Requirements

Datasets

Evaluation results

characters: 128

References

About

Resources

Stars

Watchers

Forks

Languages