MIT workshop on "How to Process, Analyze and Visualize Data"
Day 0: Class organization and programming environment setup.
Day 1: An end-to-end example getting you from a dataset found online to several plots of campaign contributions.
Day 2: Lots of visualization examples, and practice going from data to chart.
Day 3: Statistics basics, including T-Tests, Linear Regression, and statistical significance. We'll use campaign finance and per-county health rankings.
Day 4: Text processing on a large text corpus (the Enron email dataset) using tf-idf and cosine similarity.
Day 5: Scaling up to process large datasets using Hadoop/MapReduce on a larger copy of the Enron dataset.