Political Content Classification

Classifying us centric political posts on reddit.

Motivation

Annoyed by US centric news/political posts on reddit.

How was dataset generated:

Mine data using PushShift, Reddit API and BigQuery and merge them.
Label posts based on the subreddit.
Extract keywords using TextRank and generate a frequency table.
Train models using relative frequencies of extracted keywords.
A simple logistic regression on relative word frequencies is giving ~94% accuracy in classification.

How to use

Included a logistic regression model.
Example:

from classifier import Classifier
Classifier.predict(text)

Final dataset can be found here.

TODO

Make a browser plugin.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
classifier		classifier
scrape_and_prepare_dataset		scrape_and_prepare_dataset
.gitignore		.gitignore
README.md		README.md
reddit_classification.ipynb		reddit_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

classifier

classifier

scrape_and_prepare_dataset

scrape_and_prepare_dataset

.gitignore

.gitignore

README.md

README.md

reddit_classification.ipynb

reddit_classification.ipynb

Repository files navigation

Political Content Classification

Motivation

How was dataset generated:

How to use

TODO

About

Languages

AnjayGoel/political-content-classification

Folders and files

Latest commit

History

Repository files navigation

Political Content Classification

Motivation

How was dataset generated:

How to use

TODO

About

Topics

Resources

Stars

Watchers

Forks

Languages