Lesson 1: Data Extraction Fundamentals
Assessing the Quality of Data Intro to Tabular Formats Parsing CSV Parsing XLS with XLRD Intro to JSON Using Web APIs
Lesson 2: Data in More Complex Formats
Intro to XML XML Design Principles Parsing XML Web Scraping Parsing HTML
Lesson 3: Data Quality
What is Data Cleaning? Sources of Dirty Data Measuring Data Quality A Blueprint for Cleaning Auditing Validity Auditing Accuracy Auditing Completeness Auditing Consistency Auditing Uniformity
Lesson 4: Working with MongoDB
Data Modeling in MongoDB Introduction to PyMongo Field Queries Projection Queries Getting Data into MongoDB Using mongoimport Operators like $gt, $lt, $exists, $regex Querying Arrays and using $in and $all Operators Changing entries: $update, $set, $unset
Lesson 5: Analyzing Data
Examples of Aggregation Framework The Aggregation Pipeline Aggregation Operators: $match, $project, $unwind, $group Multiple Stages Using a Given Operator
Lesson 6: Case Study - OpenStreetMap Data
Using iterative parsing for large data files Open Street Map XML Overview Exercises around OpenStreetMap data