AskReddit Analytics

Predicting the performance of /r/AskReddit submissions.

EECS 349 project by:

Getting started

With Python installed, you can pull a new batch of data by executing query.py from the terminal:

./query.py

query.py allows you to specify a subreddit, a sortview, and a number of posts, fetches information about the corresponding posts, and writes the output to a timestamped CSV file.

By default, query.py executes the following parameters:

Subreddit: "AskReddit"
Sort view: "hot"
Number of posts: 25

CSV files are saved to the /output directory.

Attributes

query.py returns the following attributes:

title (string)
title_length (numerical)
serious (binary)
nsfw (binary)
post_utcTime (numerical)
post_localTime (numerical)
time_to_first_comment (numerical)
author_gold (binary)
author_account_age (numerical)
author_link_karma (numerical)
author_comment_karma (numerical)

TODO

Make sure it works for "non-hot" data
Implement duplicate detection and removal for posts...honestly, figure out what to do with duplicates to begin with lol
Set up as a cron job on raspberry pi
Run keyword occurences
Figure out strategies for sentiment and topic category analysis

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Data to run with Weka		Data to run with Weka
conv3		conv3
conv5		conv5
hot		hot
old		old
output		output
random		random
scrapedRandom		scrapedRandom
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
io_tools.py		io_tools.py
io_tools.pyc		io_tools.pyc
query.py		query.py
randomConverter.py		randomConverter.py

aiqiliu/AskReddit-analytics

Folders and files

Latest commit

History

Repository files navigation

AskReddit Analytics

Getting started

Attributes

TODO

About

Resources

Stars

Watchers

Forks

Languages