Where-Are-You-Tweeting-From-

Use of Naiive Bayes Text Classification and Mutual Information Feature selection to predict from what major city region a twitter user is tweeting

I created an automated script to gather tweets with known longitudes/latitudes using Twitter's Streaming API over the course of 4 weeks to create a random sample set.

I filtered them down to 9 major city regions: Los Angeles, San Francisco, Boston, New York, Chicago, Seattle, Atlanta, Houston, and Miami, and I mapped the results on Google Maps using the Fusion Table API.

I used documentation from http://nlp.stanford.edu/IR-book/html/htmledition/mutual-information-1.html to implement Mutual Information to find each regions most significant "features" or words.

I used documentation from http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html to implement a Naive Bayes Text classifier that can predict where a tweet with an unknown longitude/latitude is from, based on an analysis of past tweets. The classifier tokenizes a user's input/tweet, and assigns a weighted value (Probability of a word, given a city) to every word in the input. The classifier also takes into account the sample set of a given region, and finally ranks each city-- making the winner the final prediction.

I used the results from applying Mutual Information to better communicate to the user how a prediction was made. When you hover over a city amongst the rankings, a user can see if/what words from their tweet are significant words to what city regions.

The link to the final project online is where-are-you-tweeting-from.herokuapp.com

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
app		app
db_repository		db_repository
seed_data		seed_data
static		static
templates		templates
tweety		tweety
.DS_Store		.DS_Store
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
__init__.py		__init__.py
add_feature_to_db.py		add_feature_to_db.py
config.py		config.py
data.py		data.py
db_create.py		db_create.py
db_migrate.py		db_migrate.py
feature_selection.py		feature_selection.py
fusion.csv		fusion.csv
fusion_table_data.py		fusion_table_data.py
haversine.py		haversine.py
mess_with_data.py		mess_with_data.py
model.py		model.py
myfile.sql		myfile.sql
requirements.txt		requirements.txt
seed.py		seed.py
test.py		test.py
twitter_map.py		twitter_map.py
update_words_features_tables.py		update_words_features_tables.py

muluayele999/Where-Are-You-Tweeting-From-

Folders and files

Latest commit

History

Repository files navigation

Where-Are-You-Tweeting-From-

About

Resources

Stars

Watchers

Forks

Languages