Digestant

Dev Environment

Recommended to create a new virtual environment to manage your python project.
Download python packages from requirements.txt: $ pip install -r requirements.txt.
Download NLTK data: $ python -m nltk.downloader all.
Download SpaCy en_core_web_md model: $ python -m spacy download en_core_web_md.
Download stanford-ner-xxxx-xx-xx zip file Stanford NER model
1. Download from the official website.
2. Unzip and place the stanford-ner-xxxx-xx-xx folder the project root path. The name of folder should also be stanford-ner/.

Create a twitter and reddit account, follow the accounts that you are interested in.
Copy config-sample.json and rename it to config.json in the same directory. Remember to fill the keys in config.json. (Go to your twitter/reddit developer console, create application and get keys.)
We need to crawl twitter data, so run the script crawlers/twitter_crawler.py. It will automatically crawl data and save them to dataset/twitter/ by default.
You can customize data entities by modifying domains.json and types.json. (See demo)
Currently, you can execute demo/demo_howard.ipynb or other notebooks to see daily digest.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
cluster		cluster
corpus		corpus
crawlers		crawlers
data_helpers		data_helpers
demo		demo
examples		examples
explorations		explorations
preprocess		preprocess
statistics		statistics
.gitignore		.gitignore
README.md		README.md
config-sample.json		config-sample.json
domains.json		domains.json
requirements.txt		requirements.txt
types.json		types.json