Skip to content

Yunlily/Insight-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 

Repository files navigation

Insight-Project

This is a project during Insight Engineering Fellow Program.

Description

This project is inspired by Spark Summit East talk by Vida Ha. The main tool is Spark, which will perform batch process and join query on dataframe. The engineering challenge of this project is to optimize the time and space complexity of two join queries:

  1. Join a person information table with a credit score table to figure out each client's credit score.
  2. Join a daily transaction table with a card table to add card information to today's transaction information.

Dataset

The data is generated. Data size:

Person info Table: 3,000,000.

Card info Table: 3,000,000,000.

Daily Transaction Table: 100,000,000.

Tech Stack:

Techstack

Optimization Result:

https://bit.ly/2NeGlur

Slides:

https://bit.ly/2NXTjgI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages