Skip to content

borbert/Data_Engineering_Nanodegree

Repository files navigation

Data Engineering Nanodegree -- Udacity

This is repository is for all of the work completed in the Udacity Data Engineering Nanodegree. This readme provides an overview and acts as a table of contents for the individual projects/submissions.

It is divided into two sections:

  • Data engineering for the fictional Sparkify app
  • Capstone project

SparkifyDB

A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The following projects are data engineering projects to support their ongoing evloution of their business processes and app.

Projects

  1. Data Modeling in Postgres. This data enigneering project is to create the infrastructure to support the Sparkify data analytics team.

  2. Data Modeling in Cassandra (Jupyter notebook). This jupyter notebook was created to help the data analytics team collect and analyze song play data. The song play data is modeled in the notebook and placed in a Cassandra database.

  3. Data Warehousing in AWS. This project supports the Sparkify initiative to move their data analytics and data to the cloud.

  4. Data Lake in AWS. This project was to support the further evloution of the user base and data analytics requirements of the Sparkify team by transitioning their cloud data warehouse into a data lake.

  5. Data pipelines with Airflow. This data pipeline was completed to further automate the ETL pipelines and monitoring of the data warehouses of Sparkify.

This capstone project was a chance to combine all of the data engineering learnings of this program into one project.

About

This repository is the collection point for all of the projects completed during the Udacity Data Engineering Nano Degree program.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages