Skip to content

hahajain/flightAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Flight Data Analysis using PySpark and Zeppelin

Queries

Easy

  1. What are the longest and shortest distance flights?
  2. Which dates had the maximum number of flights?
  3. Which carriers experienced the most delays?
  4. Which carrier operated most number of flights in 2016?

Medium

  1. Which route has the most cancelled flights?
  2. Which carrier has the highest cancelled flights/total flights ratio?

Tough Questions

  1. Which routes have overall maximum delays, across the entire year?
  2. Can we predict if a flight will be cancelled? (Using pyspark.ml and building a predictive model)

About

Flight Data Analysis using PySpark and Zeppelin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages