Skip to content

sachinyar/Apache-Spark-in-7-Days

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Spark in 7 Days [Video]

This is the code repository for Apache Spark in 7 Days [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish.

About the Video Course

Apache Spark in 7 days aims to help you quickly get started in learning about this big data processing engine. Starting out with deploying a Spark cluster in AWS cloud with a Python EC2 script, it’ll quickly dive into how you can monitor your Spark job, using a UI dashboard, while running a test script.

The course will then help you to grasp the fundamentals of RDDs, DataFrames and Spark SQL. It goes on to cover machine learning fundamentals and models, demonstrating how to build a data pipeline. The course ends with the topic on streaming, focusing on the lower level API DStreams and then on higher level API Structured Streaming. By the end of this course you should be able to write basic Spark code using python with excellent speed. It’s ideal for any beginner looking to skill up and to eventually use a cloud service like AWS.

What You Will Learn

  • Discover how to deploy a Spark cluster in the AWS cloud, using a Python EC2 script
  • Learn basic Spark concepts such as transformations and actions
  • Explore what RDDs are and how to perform operations on them
  • Run queries using Spark SQL
  • Explore Resilient Distributed Datasets (RDDs) and how to use them
  • Write Spark SQL queries and work with Spark DataFrames
  • Learn how to use the MLlib library for machine learning applications
  • Discover streaming operations

Instructions and Navigation

Assumed Knowledge

To fully benefit from the coverage included in this course, you will need:
● Python programming experience is required

Technical Requirements

This course has the following software requirements:
This course has the following software requirements:

● An editor like Atom, Sublime Text or Visual Studio Code

● Create an AWS account

● Apache Spark - Version 2.3 or later

● Anaconda Distribution - Python 3.7 or later

● Jupyter Notebook Extensions

This course has been tested on the following system configuration:

● OS: Mac OS

● Processor: 2.5 GHz Intel Core i5

● Memory: 4GB

● Hard Disk Space: 400MB

Related Products

About

Apache Spark in 7 Days [Video], by Packt Publishing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 80.6%
  • Python 19.4%