Welcome to Intro to Data Engineering!
See syllabus.md for the current syllabus.
Organized by weeks and days:
(subject to change;
see detailed schedule below)
- Welcome to Data Engineering
- The Cloud
- The Cloud
- Deployment
- Big Data Architecture
- Review Day - Project Data Due
- Parallel Processing
- MapReduce (Divide-and-conquer for Distributed Systems)
- The MapReduce Algorithm & Hadoop
- MapReduce Design Patterns
- Spark Overview
- Review Day - Project Proposals Due
- SQL (The Lingua Franca of Data)
- Spark (What to add to your LinkedIn profile)
- Streaming (Everyone has to have real-time)
- Final Project Presentations
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | Data Engineering Overview | 1. Intro to Data Engineering 2. Intro to the Cloud |
Conencting to the Cloud with Python |
Tuesday | How the Internet Works | How the Web Works | Generating Reports |
Thursday | Virtualization | Virtualization & Docker | Your Very Own Web Server |
Friday | *NIX | Linux | Linux Intro |
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | Introduction to Clouds | The Cloud & AWS | Move your Linux machine to the Cloud |
Tuesday | Provisioning | EC2 & cron | Automate More |
Thursday | I ♥ Logs | Apache Kafka | Drinking from the Firehose |
Friday | Projects | Project Proposal | Proposal |
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | Functional Programming | Fun with Toolz | |
Tuesday | Threading and Webscraping | Threading and Webscraping | |
Thursday | Intro to Multiprocessing | Multiprocessing Demonstration | Multiprocessing |
Friday | Scaling Out | Distributed Computing | Embarrassingly Parallel |
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | HDFS and MapReduce | MapReduce | Scaling Out |
Tuesday | MapReduce Design Patterns | Hadoop Ecosystem | Meet MrJob |
Thursday | Introduction to Spark | Apache Spark | Spark on EMR |
Friday | Designing Big Data Systems | Review | Final Project Proposal |
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | SQL Basics | Databases and SQL | Squashing Birds |
Tuesday | Relational Design | Relational Database Modeling | Data Modeling Practice |
Thursday | Drivers and Workers | SQL: Advanced Querying | Feeding the Elephant |
Friday | Tuning SQL | Data Systems Architecture | Advanced Querying |
Day | Readings | Notes | Assignment |
---|---|---|---|
Monday | Spark DataFrames | Spark | DataFrames |
Tuesday | Programming with RDDs | Spark SQL | Spark SQL |