Twitter-NLP-SNA

This is the code used in my MPhil project at the University of Cambridge, analysing political behaviour and communication on Twitter using Social Network Analysis and Natural Language Processing.

Notes:

Data collection from Twitter requires a set of API keys - these can be obtained from the Twitter API website. The code presented here imports a document called 'keys', containing these - a new set of keys can be included in a document like this to make this code executable.
All code is executed in IPython, hence executing a line such as '>>> list_name' prints out the entire list titled 'list_name', without requiring a 'print(list_name)' statement.

This collection of code does the following:

12 documents were used in Python for data collection, as follows:

parse in a pre-made csv file of elites and their Twitter accounts, as of March 2020 (elites include UK MPs, MEPs, and Political Party accounts)
connect to the Twitter API (using a private set of keys, which will need to be re-created if this code were to be replicated), collect all followers_IDs of each of the elites, saving them in separate files titled 'fillowers_{elite}.csv'
build a network of elites and their followers, split the network up into LEFT and RIGHT (remove overlapping/central nodes); store side for main analysis
randomly sample 100,000 user_ids from LEFT and 100,000 user_ids from RIGHT network
collect 200 most recent tweets from each of the users in LEFT and RIGHT networks, saving into MongoDB database
filter the users in each sample by activity
apply POS tagging to find nouns, proper nouns etc. in Tweets; calculate noun proportions for main analysis
calculate network centrality values for all nodes; store values for main analysis
clean words in tweets (lowercase, drop 's etc.), find most frequently used ones and visualise
run additional linguistic analyses - Noun proportions without Pronouns; length of tweets on LEFT vs. RIGHT, amount of Proper Noun pairs on LEFT vs. RIGHT
repeat word analysis after excluding all pronouns, 'coronavirus' words and emoticons/emoji from both Common and Proper Noun tags
visualise words used most frequently in the profile descriptions of 100 most central users

Then, analysis was performed on the resulting data in R - the code for this is available in the analysis_R_code folder (available in the original repository that this was forked from).

Packages required:

see package imports at the top of every file

Associated article/publication:

Pre-print available: https://psyarxiv.com/v6qx5;
Associated files are available on the OSF: https://osf.io/dr7bk/?view_only=27f913c49c7b48019484f784b5db4135;
Journal publication penidng

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
LICENSE		LICENSE
README.md		README.md
data_collection_1.py		data_collection_1.py
data_collection_10_Network_Centrality.py		data_collection_10_Network_Centrality.py
data_collection_11_Draw_Networks.py		data_collection_11_Draw_Networks.py
data_collection_12_WordCounts.py		data_collection_12_WordCounts.py
data_collection_13_TwetNLP_without_Pronouns.py		data_collection_13_TwetNLP_without_Pronouns.py
data_collection_2.py		data_collection_2.py
data_collection_3_SNA_left_and_right.py		data_collection_3_SNA_left_and_right.py
data_collection_4_filtering_LEFT-RIGHT.py		data_collection_4_filtering_LEFT-RIGHT.py
data_collection_5_Network_Viz.py		data_collection_5_Network_Viz.py
data_collection_6_random_sampler.py		data_collection_6_random_sampler.py
data_collection_7_tweets_to_mongo.py		data_collection_7_tweets_to_mongo.py
data_collection_8_mongo_filtering_and_aggregation.py		data_collection_8_mongo_filtering_and_aggregation.py
data_collection_9_TweetNLP.py		data_collection_9_TweetNLP.py

License

PolPsychCam/Twitter-NLP-SNA

Folders and files

Latest commit

History

Repository files navigation

Twitter-NLP-SNA

This collection of code does the following:

Packages required:

Associated article/publication:

About

Resources

License

Stars

Watchers

Forks

Languages