Skip to content

kdebaerdemaeker/pyspark_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering with Pyspark

  • Context:

    Classroom lectures given at KBC, a financial institute in Belgium, between the dates of 2019-05-13 and 2019-05-15.

  • Objectives:

    • Introduce good data engineering practices.
    • Illustrate modular and easily testable data transformation pipelines using Pyspark.
  • Audience:

    Employees of KBC involved in writing (porting?) transformation pipelines. General knowledge level: junior - medior.

    Participants were asked (by KBC personel) to go through two online Python courses prior to participation.

  • Approach:

    Lecturer first sets the foundations right for Python development and gradually builds up to pyspark data pipelines. There is a high degree of participation expected from the students: they will need to write code themselves and reason on topics, so that they can better retain the knowledge.

    Course notes will be made available after the day sessions. Materials needed for the exercices will be provided through Github directly.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages