Twitter panel data collection

A tool for fetching twitter profile, follower and friend info for SoMA

currently requires a redis database to write it's intermediate results too

TODO: turn this into a process that requires less manual intervention so that panelist data can be continuously updated

Installation

virtualenv env
. env/bin/activate
pip install -r requirements.txt

# install redis
apt-get install redis-server
# or
brew install redis

Setup

It needs a config.ini file (look at config.ini.example) to give it a twitter api account and to tell it which redis server/database to connect to

[twitter]
consumer_key = client key value
consumer_secret = client secret value
access_token = token
access_token_secret = token secret

[redis]
host = localhost
port = 6379
db = 0

Running

collect.py will take a newline separated list of twitter names and collect profile, followers and friend ids, it will take a while unless you have a less rate limited twitter account

python collect.py list-of-twitter-names.txt

Outputing results

output.py will write out lines of json represening the panelist info for SoMA and also write out a list of twitter names where it couldn't collect some of the data it needs a mapping of twitter screen names to yougov panelist ids (a previous panel_default.json will suffice or a csv of [screen_name, id] rows)

python output.py panel.json mapping_for_yougov_id.[csv,json] missing-names.txt

This uses the panoptic proxy provided by beta.pulse.yougov.com, access to the us panoptic is trickier at the moment.

open an ssh connection to the alab-pulse1 machine and set up a socks proxy to port 8050

# connect to yougov vpn and then set up this proxy
ssh -ND 8050 username@alab-pulse1
# in another terminal
python us_output.py us_panel.json mapping_for_yougov_id.csv missing-us-names.txt

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
tests		tests
.gitignore		.gitignore
README.md		README.md
client.py		client.py
collect.py		collect.py
config.ini.example		config.ini.example
config.py		config.py
data.py		data.py
output.py		output.py
relax.py		relax.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
us_output.py		us_output.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

.gitignore

.gitignore

README.md

README.md

client.py

client.py

collect.py

collect.py

config.ini.example

config.ini.example

config.py

config.py

data.py

data.py

output.py

output.py

relax.py

relax.py

requirements.txt

requirements.txt

setup.cfg

setup.cfg

us_output.py

us_output.py

Repository files navigation

Twitter panel data collection

Installation

Setup

Running

Outputing results

About

Releases

Packages

Languages

pkqk/adaptive-panel-collection

Folders and files

Latest commit

History

Repository files navigation

Twitter panel data collection

Installation

Setup

Running

Outputing results

About

Resources

Stars

Watchers

Forks

Languages