Skip to content

Elixeus/Syrian-Refugee-Twitter-Data-Analysis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of Twitter data on Syrian Refugee crisis

TawsifKhan
October 26, 2015

Here I will use Python and R to analyze tweets regarding the Syrian Refugee crisis. I would like to see which countries are most active in terms of such tweets, and also what are the most popular words corresponding to each country.

I am currently using the following filter:

twitterStream.filter(track=["syria" or "refugee" or "syrian" or "migrant crisis"])

Most of the tweets are not geo-tagged, but some come with a location tag and a time-zone. I have tried to map those to a country. I am still working on it to make it reliable.

Since I have been into some Hadoop lately, I used MapReduce concept to count the words. So in the terminal the following does the job for me (for now):

less twitDB.csv | ../src/mapper.py | sort | ../src/python

I chose R for the analysis.

summary(df)
##       Word           Location        Count        
##  syria  :  90   Canada   :1130   Min.   :  1.000  
##  https  :  78   London   : 413   1st Qu.:  1.000  
##  http   :  51   Amsterdam: 320   Median :  1.000  
##  russian:  36   Athens   : 279   Mean   :  2.127  
##  russia :  32   Baghdad  : 149   3rd Qu.:  1.000  
##  isis   :  31   (Other)  :3152   Max.   :264.000  
##  (Other):5986   NA's     : 861

As you can see the location is still a mixture of cities and countries. And the high count of Canada makes me think that is a reason of me being in Canada. But moving on...

So we get the top locations and words.

top_ten_locations
##     Location   x
## 64    London 654
## 9  Amsterdam 496
## 11    Athens 478
## 14   Baghdad 330
## 19    Berlin 258
## 74     Paris 228
## 76     Quito 210
## 33   Chennai 196
## 53  Istanbul 181

Use ggplot to plot some bars.

library(ggplot2)
p1 <- ggplot(data=top_ten_locations,aes(x=Location,y=x))
p1 <- p1 + geom_bar(stat="identity",size=1) + ylab("Frequency")
p1 <- p1 + theme_minimal()
p1

More to follow.

About

Analysis of Twitter data on Syrian Refugee crisis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%