Tweet Classifier

This project aims to classify Tweets as science or non-science related. The ScienceClassifier is a trained version of the generic TextClassifier. The TextClassifier uses the Multinomial Naive Bayes model. All code so far is in Python.

##Dependencies

Beautiful Soup 4

##Examples: ###Science Classifier

#To use the science text classifier
from Classify.ScienceClassifier import getClassifier

c = getClassifier()

#prints 'science'
print c.classify(
"""Newton's laws of motion are three physical laws that together
laid the foundation for classical mechanics."""
)

#prints 'other'
print c.classify(
"""A disc jockey (abbreviated D.J., DJ or deejay) is a person 
who mixes recorded music for an audience; in a club event 
or rave, this is an audience of dancers."""
)

Examples text from Wikipedia: Newton's Laws, DJ. ###Generic Text Classifier

from Classify.TextClassifier import TextClassifier
from Classify.GetPage import getURLText
training_data = {
	"sport":[
		"http://en.wikipedia.org/wiki/Sport",
		"http://en.wikipedia.org/wiki/List_of_sports",
		
	],
	"food":[
		"http://en.wikipedia.org/wiki/food",
		"http://en.wikipedia.org/wiki/Sandwich",
	]
}

#new TextClassifier instance
t = TextClassifier() 

#train the classifier using the training data
for category,urls in training_data.items():
	for url in urls:
		t.train(getURLText(url),category)

#now we can put the classifier into action. Here we are classifying some tweets!
print t.classify("Sorry grilled cheese sandwiches, we've moved on to grilled chocolate sandwiches.")
print t.classify("Arkansas defense suffocates Texas in 31-7 win.")

##The Details Calling GetPage.getURLText grabs a webpage's text. The cPickle class is used for caching. At the moment cached webpages are stored in /var/tmp (this may be an issue for Windows users?)

##Attributions

Python
Beautiful Soup
StopWords
Wikipedia
Twitter
Tweets: a tweet on food and a tweet on sport

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
Classify		Classify
.gitignore		.gitignore
README.md		README.md
text_classify.py		text_classify.py
tweet_classify.py		tweet_classify.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify

Classify

.gitignore

.gitignore

README.md

README.md

text_classify.py

text_classify.py

tweet_classify.py

tweet_classify.py

Repository files navigation

Tweet Classifier

About

Releases

Packages

Languages

spamdummy/tweet-classifier

Folders and files

Latest commit

History

Repository files navigation

Tweet Classifier

About

Resources

Stars

Watchers

Forks

Languages