This folder has all the codes for the project.
-
Data Preprocessing: This folder has two files. a) Data Preprocessing for Visualization.ipynb: It contains all the required preprocessing code for visualization. b) Merge&Sample: It contains all the required coding for downsampling and merging of data files (2016, 2017, 2018)
-
Data Visualization: a) Final preliminary-data-visualization.ipynb: This file has all the coding and outputs for the data visualization.
-
Database Analysis: a) Hive: This folder has all the analytical queries on data by using the Hive technique. All the necessary outputs are also uploaded. b) Spark: This folder has all the analytical queries on data by using the Spark technique. All the necessary outputs are also uploaded. c) Pig: This folder has all the analytical queries on data by using the Pig technique. All the necessary outputs are also uploaded.
-
Recommender System:
a) RecommendationSystemPySpark.py: This file contains data preprocessing specific to recommender system and implementation code for recommender system using ALS Recommender, PySpark, SparkMLLib.