CaseStudy2

We are comparing cities in Texas and Washington and seeing if we can identify cultural differences using twitter data.

Data

The data set used for this project includes tweets issued sporadically by Twitter users residing in the states of Washington and Texas from October 16 to October 25, 2016. In total, we collected over 80,964 tweets that were posted over this time period and tagged with geographic coordinates within the bounds of Washington and Texas.

#Hypothesis

Our hypothesis is as follows: The Twitter conversation in Seattle, WA has greater similarity to Twitter conversation in Austin, TX, Dallas, TX and Houston, TX than that which originates (collectively) from the remainder of Washington state.

Approach

We used topic modeling to identify the most common "conversational themes" in four urban areas (Seattle, Dallas,Austin and Houston) and two less urban areas (the remainder of Washington state and the remainder of Texas). We further compared Twitter content in these regions using the measure of cosine similarity.

#Results

The topic models revealed mixed results, while cosine similarity values suggest that regions are more closely linked to nearby regions than regions with a similar urban or non-urban makeup, at least in terms of their Twitter users' topics of interest. While Twitter content only serves as one index of cultural similarity or dissimilarity, this index revealed significant semantic differences by region.

Reason for Project

This project was completed as a class case study, for SYS6018 Data Mining. It was conducted as a way to gain practical experience in text mining.

Collaborators

Tyler Worthington , Isabelle Yang , Mark Rooney , and Genevieve Burgoyne.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Analysis		Analysis
Data		Data
Data_Collection		Data_Collection
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis

Analysis

Data

Data

Data_Collection

Data_Collection

.gitignore

.gitignore

README.md

README.md

Repository files navigation

CaseStudy2

Data

Approach

Reason for Project

Collaborators

About

Releases

Packages

Contributors 2

Languages

worthingtont12/cultural-similarities

Folders and files

Latest commit

History

Repository files navigation

CaseStudy2

Data

Approach

Reason for Project

Collaborators

About

Resources

Stars

Watchers

Forks

Languages