This ansible script provisions a vagrant box with:
-
Oracle Java 8
-
spark_version: "2.0.0-bin-hadoop2.7"
-
Anaconda3-4.2.0-Linux-x86_64.sh (python3)
-
Jupyter Scala binaries for Scala 2.11 when using python3
You must have Ansible, VirtualBox, and Vagrant installed. See ANSIBLE.adoc. As provided, the Vagrant VM will use all of your cores and 1/2 of your memory.
To get this project working, perform the minimum steps:
$ go.sh
To Use:
-
make a data directory and put your working files there
mkdir data
-
open http://localhost:8888 in the browser
-
change directory to /vagrant/data in Jupiter notebook
-
create a new notebook
-
in the Python Jupyter notebook you can access the spark context with variable sc
-
remember to do
vagrant destroy
when you are done