Skip to content

asmiley/tweet-emotion-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

tweet-emotion-classifier

This classifier with full unicode support takes two sets of training data in the form of tab delimited files containing tweets mined using the Twitter API.

The SVM_perf implementation of the Support Vector Machine (SVM) is then used to create a binary classifier.

For example, the first training set can be composed of tweets mined using happy emoticons as a query. The second would then be composed of tweets mined using sad emoticons as a query. Running the authormator.sh shell script would clean the tweets, stripping them of superfluous data and characters.

The training data is then used to create a dictionary of features; an index of unique words in the dataset. The tweets are randomized and then divided into training, development, and testing sets that are then output as series of ‘feature:value’ pairs based on the features in each tweet found in the feature dictionary. Each of these lines was preceded by a class identifier, -1 for ‘sad’ and 1 for ’happy’ tweets.
This vectorised data was then used to train an SVM classifier using svm_pref_learn with a C value of 20 for the hashtag based classifier and 220 for the emoticon based classifier.

The resulting classifier, if it performs well, can then be used to classify further tweets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published