Skip to content

rosafilgueira/Spark-Dockerfile

 
 

Repository files navigation

Docker Automated build Docker Pulls

Spark-Dockerfile

Dockerfile for Spark

Features

  1. reetawwsum/hadoop
  2. Spark 2.1.0

Usage

Pull docker image from DockerHub

$ docker pull reetawwsum/spark

To run Spark application using Jupyter notebooks:

$ docker run --rm -t -i --name spark -p 8888:8888 -p 50070:50070 -p 8088:8088 -p 8042:8042 -p 4040:4040 reetawwsum/spark --ip=0.0.0.0

To run Spark application using Jupyter notebooks on current directory:

$ docker run --rm -t -i --name spark -p 8888:8888 -p 50070:50070 -p 8088:8088 -p 8042:8042 -p 4040:4040 -v $PWD:/usr/local/src/notebooks reetawwsum/spark --ip=0.0.0.0

To run shell after launching Jupyter Notebook:

$ docker exec -t -i spark /bin/bash

To view Hadoop process status:

$ jps

To run SimpleApp (Scala) self-contained application:

$ spark-submit --class "SimpleApp" --master local[4] Simple-Project/target/scala-2.11/simple-project_2.11-1.0.jar

To run SimpleApp (Python) self-contained application:

$ spark-submit --master local[4] SimpleApp.py

Clone this repo and

$ git clone https://github.com/reetawwsum/Spark-Dockerfile.git
$ cd Spark-Dockerfile

to build image from Dockerfile:

$ docker build -t spark .

to build PySpark script present in current directory from Sublime Text 3:

$ cp PySpark.sublime-build [user-packages folder]

License

The MIT License (MIT)

About

Dockerfile for Spark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • XSLT 59.1%
  • Jupyter Notebook 25.6%
  • CSS 12.3%
  • Shell 1.4%
  • Scala 1.1%
  • Python 0.5%