Skip to content

POBrienStonehillMD/Pyspark-Examples

Repository files navigation

Course Description

Covers the impact of big data on business and what insights big data can provide through hands-on experience with the tools and systems used by big data scientists and engineers. Software basics in Hadoop and Spark (with discussion of related software). By following along with provided code, students will experience how one can manage analytics and predictive modeling for large data sets. By the end of the course students will be able to perform basic big data analysis on a large provided data set.

Learning Goals

Topic to be covered:

  • Basic interaction with the Unix (and related) command line systems
  • Hardware and software solutions
  • Setting up and managing a Hadoop distributed file system
  • Interacting with Hadoop via Pig/Spark/Etc.
  • Performing predictive modeling with Spark
  • SAS integration with Hadoop

Dataset exploration

For this course we will be exploring two different datasets. The first dataset is related to the stock market which will be provided by Alpha Vantage API (source:https://www.alphavantage.co/documentation/). The second dataset is on NYC taxi data which is sourced from the NYC.gov website (soucre:https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page).

Data Analytic Tools

For our analysis we will be using a variety of analysis tools like Spark and UNIX to gather our data. We will then be using SAS coding software to do our analysis on our data we have collected.

Code

Code can be found above.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published