Skip to content

spamdummy/tweet-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tweet Classifier

This project aims to classify Tweets as science or non-science related. The ScienceClassifier is a trained version of the generic TextClassifier. The TextClassifier uses the Multinomial Naive Bayes model. All code so far is in Python.

##Dependencies

##Examples: ###Science Classifier

#To use the science text classifier
from Classify.ScienceClassifier import getClassifier

c = getClassifier()

#prints 'science'
print c.classify(
"""Newton's laws of motion are three physical laws that together
laid the foundation for classical mechanics."""
)

#prints 'other'
print c.classify(
"""A disc jockey (abbreviated D.J., DJ or deejay) is a person 
who mixes recorded music for an audience; in a club event 
or rave, this is an audience of dancers."""
)

Examples text from Wikipedia: Newton's Laws, DJ. ###Generic Text Classifier

from Classify.TextClassifier import TextClassifier
from Classify.GetPage import getURLText
training_data = {
	"sport":[
		"http://en.wikipedia.org/wiki/Sport",
		"http://en.wikipedia.org/wiki/List_of_sports",
		
	],
	"food":[
		"http://en.wikipedia.org/wiki/food",
		"http://en.wikipedia.org/wiki/Sandwich",
	]
}

#new TextClassifier instance
t = TextClassifier() 

#train the classifier using the training data
for category,urls in training_data.items():
	for url in urls:
		t.train(getURLText(url),category)

#now we can put the classifier into action. Here we are classifying some tweets!
print t.classify("Sorry grilled cheese sandwiches, we've moved on to grilled chocolate sandwiches.")
print t.classify("Arkansas defense suffocates Texas in 31-7 win.")

##The Details Calling GetPage.getURLText grabs a webpage's text. The cPickle class is used for caching. At the moment cached webpages are stored in /var/tmp (this may be an issue for Windows users?)

##Attributions

About

Text Classification. Classify science related tweets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages