Insight-Project

This is a project during Insight Engineering Fellow Program.

Description

This project is inspired by Spark Summit East talk by Vida Ha. The main tool is Spark, which will perform batch process and join query on dataframe. The engineering challenge of this project is to optimize the time and space complexity of two join queries:

Join a person information table with a credit score table to figure out each client's credit score.
Join a daily transaction table with a card table to add card information to today's transaction information.

Dataset

The data is generated. Data size:

Person info Table: 3,000,000.

Card info Table: 3,000,000,000.

Daily Transaction Table: 100,000,000.

Tech Stack:

Optimization Result:

https://bit.ly/2NeGlur

Slides:

https://bit.ly/2NXTjgI

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Image		Image
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image

Image

src

src

README.md

README.md

Repository files navigation

Insight-Project

Description

Dataset

Tech Stack:

Optimization Result:

Slides:

About

Releases

Packages

Languages

Yunlily/Insight-Project

Folders and files

Latest commit

History

Repository files navigation

Insight-Project

Description

Dataset

Tech Stack:

Optimization Result:

Slides:

About

Resources

Stars

Watchers

Forks

Languages