This is the code used in my MPhil project at the University of Cambridge, analysing political behaviour and communication on Twitter using Social Network Analysis and Natural Language Processing.
Notes:
- Data collection from Twitter requires a set of API keys - these can be obtained from the Twitter API website. The code presented here imports a document called 'keys', containing these - a new set of keys can be included in a document like this to make this code executable.
- All code is executed in IPython, hence executing a line such as '>>> list_name' prints out the entire list titled 'list_name', without requiring a 'print(list_name)' statement.
12 documents were used in Python for data collection, as follows:
- parse in a pre-made csv file of elites and their Twitter accounts, as of March 2020 (elites include UK MPs, MEPs, and Political Party accounts)
- connect to the Twitter API (using a private set of keys, which will need to be re-created if this code were to be replicated), collect all followers_IDs of each of the elites, saving them in separate files titled 'fillowers_{elite}.csv'
- build a network of elites and their followers, split the network up into LEFT and RIGHT (remove overlapping/central nodes); store side for main analysis
- randomly sample 100,000 user_ids from LEFT and 100,000 user_ids from RIGHT network
- collect 200 most recent tweets from each of the users in LEFT and RIGHT networks, saving into MongoDB database
- filter the users in each sample by activity
- apply POS tagging to find nouns, proper nouns etc. in Tweets; calculate noun proportions for main analysis
- calculate network centrality values for all nodes; store values for main analysis
- clean words in tweets (lowercase, drop 's etc.), find most frequently used ones and visualise
- run additional linguistic analyses - Noun proportions without Pronouns; length of tweets on LEFT vs. RIGHT, amount of Proper Noun pairs on LEFT vs. RIGHT
- repeat word analysis after excluding all pronouns, 'coronavirus' words and emoticons/emoji from both Common and Proper Noun tags
- visualise words used most frequently in the profile descriptions of 100 most central users
Then, analysis was performed on the resulting data in R - the code for this is available in the analysis_R_code folder (available in the original repository that this was forked from).
see package imports at the top of every file
- Pre-print available: https://psyarxiv.com/v6qx5;
- Associated files are available on the OSF: https://osf.io/dr7bk/?view_only=27f913c49c7b48019484f784b5db4135;
- Journal publication penidng