This is the code repository for Apache Spark in 7 Days [Video], published by Packt. It contains all the supporting project files necessary to work through the video course from start to finish.
Apache Spark in 7 days aims to help you quickly get started in learning about this big data processing engine. Starting out with deploying a Spark cluster in AWS cloud with a Python EC2 script, it’ll quickly dive into how you can monitor your Spark job, using a UI dashboard, while running a test script.
The course will then help you to grasp the fundamentals of RDDs, DataFrames and Spark SQL. It goes on to cover machine learning fundamentals and models, demonstrating how to build a data pipeline. The course ends with the topic on streaming, focusing on the lower level API DStreams and then on higher level API Structured Streaming. By the end of this course you should be able to write basic Spark code using python with excellent speed. It’s ideal for any beginner looking to skill up and to eventually use a cloud service like AWS.
- Discover how to deploy a Spark cluster in the AWS cloud, using a Python EC2 script
- Learn basic Spark concepts such as transformations and actions
- Explore what RDDs are and how to perform operations on them
- Run queries using Spark SQL
- Explore Resilient Distributed Datasets (RDDs) and how to use them
- Write Spark SQL queries and work with Spark DataFrames
- Learn how to use the MLlib library for machine learning applications
- Discover streaming operations
To fully benefit from the coverage included in this course, you will need:
● Python programming experience is required
This course has the following software requirements:
This course has the following software requirements:
● An editor like Atom, Sublime Text or Visual Studio Code
● Create an AWS account
● Apache Spark - Version 2.3 or later
● Anaconda Distribution - Python 3.7 or later
● Jupyter Notebook Extensions
This course has been tested on the following system configuration:
● OS: Mac OS
● Processor: 2.5 GHz Intel Core i5
● Memory: 4GB
● Hard Disk Space: 400MB