Automeme

Language modeling and classification for document classes defined by 'Memes' (AdviceAnimals)

by Jay Hack & Sam Beder, Fall 2013

1: Description

TODO: fill this out

Given a sentence

all source files are contained in the base directory

in data:

• data/memes contains a list of json objects representing memes • data/

(all of this on MBPro)

• Pandas load: takes < 5 seconds

• Tokenizer: takes < 20 seconds

• BOW representation: takes < 10 seconds

• BOW -> vocab mat: < 15 seconds

• sklearn Logistic Regression fit(X, y): about 4 mins!

• Not too great right now... need better data

Automeme uses the following libraries:

• nltk

•

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
classifiers		classifiers
data		data
data_diylol		data_diylol
data_quickmeme		data_quickmeme
saved_data		saved_data
templates		templates
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitignore~		.gitignore~
Automeme.py		Automeme.py
Automeme.pyc		Automeme.pyc
Meme.py		Meme.py
Meme.pyc		Meme.pyc
Meme_Spider.py		Meme_Spider.py
Preprocess.py		Preprocess.py
Preprocess.pyc		Preprocess.pyc
README.md		README.md
__init__.py		__init__.py
common_utilities.py		common_utilities.py
common_utilities.pyc		common_utilities.pyc
demo_meme_classifier.py		demo_meme_classifier.py
demo_sentence_classifier.py		demo_sentence_classifier.py
paragraph_test.py		paragraph_test.py
tfidf_classifier.py		tfidf_classifier.py