Skip to content

xuezhizeng/Apache-Spark-2-for-Beginners

 
 

Repository files navigation

#Apache Spark 2 for Beginners This is the code repository for Apache Spark 2 for Beginners, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish. ##Instructions and Navigations All of the code is organized into folders. Each folder starts with a number followed by the application name. For example, Chapter02.

Software and Hardware List

Chapter number Software required (with version) Free/Proprietary If proprietary, can code testing be performed using a trial version If proprietary, then cost of the software Download links to the software Hardware specifications OS required
All Apache Spark 2.0.0 Free NA NA http://spark.apache.org/downloads.html X86 UNIX or MacOSX
6 Apache Kafka 0.9.0.0 Free NA NA http://www.sublimetext.com/3 X86 UNIX or MacOSX

Detailed installation steps (software-wise)

The steps should be listed in a way that it prepares the system environment to be able to test the codes of the book. ###1. Apache Spark: a. Download Spark version mentioned in the table
b. Build Spark from source or use the binary download and follow the detailed instructions given in the page http://spark.apache.org/docs/latest/building-spark.html
c. If building Spark from source, make sure that the R profile is also built and the instructions to do that is given in the link given inthe step b.
###2. Apache Kafka a. Download Kafka version mentioned in the table
b. The “quick start” section of the Kafka documentation gives the instructions to setup Kafka. http://kafka.apache.org/documentation.html#quickstart
c. Apart from the installation instructions, the topic creation and the other Kafka setup pre-requisites have been covered in detail in the chapter of the book

The code will look like the following:

Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin

Spark 2.0.0 or above is to be installed on at least a standalone machine to run the code samples and do further activities to learn more about the subject. For Spark Stream Processing, Kafka needs to be installed and configured as a message broker with its command line producer producing messages and the application developed using Spark as a consumer of those messages.

##Related Products

About

Apache Spark 2 for Beginners, published by Packt

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.6%
  • Scala 3.8%
  • Python 2.6%
  • Other 1.0%