Description

This is a Tweet sentiment analyser that uses:

[A word2vec model][1] that was trained on 400 million Tweets.
(Very) simple linguistic features.

The system first trains a neural net (NN) on the word2vec vectors, then it combines the prediction of this NN with the linguistic features and finally trains another NN and print out some metrics of the final predictions.

[1]: Multimedia Lab @ ACL W-NUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations

Dependencies

The usual Python scientific stack (numpy, sklearn, etc.) is needed. In addition, nolearn is used, which in turn needs lasagne and theano to be installed. For a detailed list checkout the requirements.txt file.

How to Use

The script expects to find the aforementioned word2vec model in a directory called models located at the root. Also the script expects to find three text files in the data folder:

negative-all: each line contains a tweet with a negative sentiment.
positive-all: each line contains a tweet with a positive sentiment.
neutral-all: each line contains a tweet with a neutral sentiment.

Once all the data are in place and the dependencies are installed, simply run main.py to see how the system does. The running time using the current NN architectures is about ~25 minutes on my machine.

Possible Enhancements

The main enhacement is definitely the linguistic features. The ones used barely improve (if any) the performance. POS tags and other NLP features will definitely help. If we can find features that help in discriminating between the positive and neutral cases that would be great, as most of the confusion is between them (check the ipython notebook for more).
The preprocessing can also be improved. For example one could use the same preprocessing done for the word2vec model to get better features from word2vec. And I'm also sure we can improve it for the linguistic feature extraction in someway.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src		src
.gitignore		.gitignore
License.md		License.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

License.md

License.md

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Description

Dependencies

How to Use

Possible Enhancements

About

Releases

Packages

Languages

License

smartinsightsfromdata/twitter-sentiment

Folders and files

Latest commit

History

Repository files navigation

Description

Dependencies

How to Use

Possible Enhancements

About

Resources

License

Stars

Watchers

Forks

Languages