A social media data collection framework.
Note: All instructions below, unless otherwise stated, assume a UNIX environment. Depending on your privileges you may need to
sudo
things or be logged in as an administrator on Windows.
And that's it! ...Okay not really. The next step is to set up proper authorization for Parrot (OAuth, secret keys, cookies, and all that good stuff).
There should be a file in the parrot/
directory called tokens.py
. This file should contain all the OAuth tokens necessary for running Parrot, as well as a random COOKIE_SECRET
for added security.
The COOKIE_SECRET
is autogenerated during setup, but if for whatever reason you want a new one, you can generate it very easily by using the following Python script:
import base64
import uuid
base64.b64encode(uuid.uuid4().bytes + uuid.uuid4().bytes)
To get the remaining OAuth tokens for the various social media sites, you will need to set up apps and developer accounts with them. This is left as an exercise for the reader.
Warning: If you do not have this file, Parrot will not run. You may choose to omit certain tokens if you do not need them, but you must at least include the COOKIE_SECRET
. Also (it should go without saying but just in case...) this file should NEVER be made public.
By default, Parrot is password-protected by a login page. This is to prevent random people from using up the data collection quotas.
Parrot defines a set of known users as a Python dictionary called USER_DICT
in parrot_settings.py
. The key is the username, and the value is a salted sha256 of the corresponding password. There are a few users defined already but they probably won't be of much use to you, so delete them all.
You can create a new user by simply adding them to the dictionary. To get the password, run strap run pass_gen
and enter a password at the prompt. The program will spit out the salted pass that you can then put into USER_DICT
.
- Start the MongoDB daemon with
mongod
- Execute
strap
in the project root directory to start the server
- CLiPS: Pattern