Junkmail is an email document classifier that analyzes the likelihood of responding to an email. Currently it provides a playground to test the features to be extracted from emails.
- Python >= 2.6
- nltk >= 2.0
- sqlalchemy > 0.6.0
On mac you can install the dependencies using MacPorts, or in general via
easy_install
tool.
-
Download your emails with
downloadmail.py
scriptpython -m downloadmail test@gmail.com
The script would prompt you for your gmail password. The script by default
downloads all the emails dating back to beginning of 2010, and stores them
into mail.sql
file.
-
Run the analyzer on the file:
python -m analyze mail.sql
The script would show the most important features in determining which emails you have replied to so far.
For now, you can customize the analyzer by specifying which features to
extract in analyze.features
, the method comes with a sample of feature
extractors; you can build on that.
Also the analyzer uses the nltk
naive bayes classifier. Feel free to try
other classifiers as well.