- Get press releases from Bernie and Trump's websites, store them as
.json
files. - Get last 3200 Tweets from Bernie and Trump's Twitter accounts.
- Get up to 3200 Tweets from a small sample of people following both Bernie and Trump.
- Perform basic NLP feature extraction, such as constructing unigrams and bigrams, and weighting term frequencies.
- Train a variety of classifiers (less scary than it sounds) on the Trump-Bernie Twitter data.
- Use these classifiers to predict: 1) whether a Tweet is Trump or Bernie's; 2) Whether a Tweet comes from a Trumpist or Bernie-ite; 3) Whether a press release was from Trump or Bernie's campaign.
Lesson 1
: Step 1Lesson 2
: Steps 2 and 3Lesson 3
: Step 4Lesson 4
: Steps 5 and 6
- json
- time
- random
Install 3rd party packages with pip
For instance: pip install python-twitter
- python-twitter
- beautifulsoup4
- selenium
- numpy
- scipy
- scikit-learn
- nltk
- PhantomJS is crucial for our crawler to work but cannot be installed via pip. To install it you must use
brew
, which can be installed in the command line by typing/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
. You may have to respond to some prompts during the installation which should take a couple of minutes. - Once brew is installed you can easily install PhantomJS by typing the command
brew install phantomjs
.
- Put a space before a comment:
# This is a comment
- Don't make lines longer than ~80 characters
- Constants in all-caps:
MY_CONSTANT
- Use underscores to separate words in variable names:
my_variable
- Avoid meaningless variable names. Avoid numbers in variable names. Wrong:
thing1
,thing2
. Right:cat_list
,fluffy_cat_list
. - When ambiguous, put variable type in name:
my_list
ormy_set
. This is particularly important for collections. Is it adict
or alist
? - Document code with triple quotes (multiline comments):
"""My documentation"""
- Write functions when you find yourself repeating code
- When importing modules, don't import specific functions. Import the whole module, and use the module name and function together. Right:
import time; time.sleep(1)
. Wrong:from time import sleep; sleep(1)
- When you find yourself checking if items are in a
list
, use aset
- Write a snippet of documentation at the top of your file to help you remember what the file does.
- Write inputs and outputs to functions in a comment in the function body.