Getting Started

#Description of files -miner.py was our first attempt to mine posted articles in subreddits
-scraper.js is our second attempt to scrape the article links and metadata. It's used in conjunction with Reddit Enhancement Suite and injected into the browser. The collected data is in the json folder.
-jsonparser.js takes those jsons of article urls, scraped those urls using Readability API and populated our DB
-classification folder includes the final classifier testing script (as well as an earlier V1 script) that ran classifier cross validations.
-classification folder also includes NewRedditSampleAll.csv that is our complete and final scraped data of 9 subreddits.
-prediction folder includes the pickled best classifier and vectorizer, as well as prediction code is now integrated into server.py
-server.py runs our web application (URL is localhost:5000)

Getting Started

Getting the Dependencies

export CFLAGS=-Qunused-arguments
export CPPFLAGS=-Qunused-arguments
pip install -r requirements.txt

Running the subreddit finder

python subreddit_selector.py

Running the miner

Your IP needs to be added to the RDS security group, or you have to change helpers/db.py to point to your PostgreSQL db.
python miner.py

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
classification		classification
helpers		helpers
json		json
prediction		prediction
static		static
.gitignore		.gitignore
README.md		README.md
extractor.py		extractor.py
index.html		index.html
jsonparser.py		jsonparser.py
miner.py		miner.py
requirements.txt		requirements.txt
scaper.js		scaper.js
server.py		server.py
subreddit_selector.py		subreddit_selector.py
subreddits.txt		subreddits.txt

kklimuk/reddit-predictor

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Getting the Dependencies

Running the subreddit finder

Running the miner

About

Resources

Stars

Watchers

Forks

Languages