The dataset used during these experiments is the latest version provided by William Cohen foun in the following link.
The training dataset which we will use as a startpoint is the labeled subset from Martin Wunderlich, which can be found [here] (http://www.martinwunderlich.com/enron/results/TrainingEmails_Manually_categorized.zip).
The relevant papers for our project can be found under the Papers folder. Among the most relevant ones we can find:
http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html http://www.andreykurenkov.com/writing/organizing-my-emails-with-a-neural-net/