Skip to content

rbtoner/Fanguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

FanGuard: The Smart Tumblr Spoiler Filter

Backup of modules and macros for my Insight Data Science SV 2016A project.

Contents of repository:

AlgorithmTrain: Algorithm training functions and associated algorithms

  • AnaFunc.py: High-level and drawing functions
  • dfmaker.py: Functions for constructing dataframes, modifying cleaned data, etc.
  • modelmaker.py: Vocabulary and model creation, CV and training lower-level functions

DataCollection: Data collection and cleaning (from Tumblr API)

  • postGather.py: Functions to gather posts from the Tumblr API (employs pytumblr)
  • cleaners.py: Cleaning functions

macros: Collection of ipython notebooks for analysis

  • AuthorTest.ipynb: Testing effects of repeat authorship on model
  • DataGather.ipynb: Gathering posts from 9 separate content sources, cleaning, and storing to MySQL database.
  • Model_SpoilerFilter_CV.ipynb: Cross-validation of models and optimization of parameters.
  • Model_Train_PreFilter.ipynb: Training for Pre-Filter vocabulary and check for accuracy.
  • Model_Train_SpoilerFilter.ipynb: Training of Spoiler Filter, check for accuracy (ROC), with diagnostic plots.

About

Insight Data Project SV 2016A.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published