We are comparing cities in Texas and Washington and seeing if we can identify cultural differences using twitter data.
The data set used for this project includes tweets issued sporadically by Twitter users residing in the states of Washington and Texas from October 16 to October 25, 2016. In total, we collected over 80,964 tweets that were posted over this time period and tagged with geographic coordinates within the bounds of Washington and Texas.
#Hypothesis
Our hypothesis is as follows: The Twitter conversation in Seattle, WA has greater similarity to Twitter conversation in Austin, TX, Dallas, TX and Houston, TX than that which originates (collectively) from the remainder of Washington state.
We used topic modeling to identify the most common "conversational themes" in four urban areas (Seattle, Dallas,Austin and Houston) and two less urban areas (the remainder of Washington state and the remainder of Texas). We further compared Twitter content in these regions using the measure of cosine similarity.
#Results
The topic models revealed mixed results, while cosine similarity values suggest that regions are more closely linked to nearby regions than regions with a similar urban or non-urban makeup, at least in terms of their Twitter users' topics of interest. While Twitter content only serves as one index of cultural similarity or dissimilarity, this index revealed significant semantic differences by region.
This project was completed as a class case study, for SYS6018 Data Mining. It was conducted as a way to gain practical experience in text mining.
Tyler Worthington , Isabelle Yang , Mark Rooney , and Genevieve Burgoyne.