Language modeling and classification for document classes defined by 'Memes' (AdviceAnimals)
by Jay Hack & Sam Beder, Fall 2013
TODO: fill this out
Given a sentence
all source files are contained in the base directory
in data:
• data/memes contains a list of json objects representing memes • data/
(all of this on MBPro)
• Pandas load: takes < 5 seconds
• Tokenizer: takes < 20 seconds
• BOW representation: takes < 10 seconds
• BOW -> vocab mat: < 15 seconds
• sklearn Logistic Regression fit(X, y): about 4 mins!
• Not too great right now... need better data
Automeme uses the following libraries:
• nltk
•