GitHub - codecraker/spark-examples-1: Spark examples to go with me presentation on 10/25/2014

==============

Latest version of spark can be downloaded at http://spark.apache.org/downloads.html

to run the spark shell use:

./bin/spark-shell

The pyspark shell can be started using:

./bin/pyspark

To run spark with IPython notebook you need to have IPython notebook installed. It can be installed using :

pip install ipython
pip install 'ipython[notebook]'

to run pyspark with ipython notebook:

IPYTHON_OPTS="notebook --pylab inline --notebook-dir=<directory sto store notebooks>" MASTER=local[6] ./bin/pyspark --executor-memory=6G

Latest examples are in the ipython-notebook folder

Once you have ipython-notebook setup with this direcory as the home you can access ipython notebook at port 8888 (default)

Running examples with provided docker container

#Pull image from docker hub
sudo docker pull anantasty/ubuntu_spark_ipython:latest

#or load from disk
sudo docker load < ubuntu_spark_ipython.tar

# Find image id using 

sudo docker images

# Run Image using
# -v arg takes local path and mounts it to path on container
# eg. -v ~/spark-examples:ipython will mount ~/spark-examples
# to /ipython on container

sudo docker run -i -t -h sandbox -v $(pwd):/ipython -d <IMAGE_ID> -d

# if you want to have ipython run on localhost use

sudo docker run -i -t -h sandbox -p 8888:8888 -v $(pwd):/ipython -d <IMAGE_ID> -d


# Upload files from repo to HDFS
# Step 1 get container id
sudo docker ps

#Step 2 log into container

sudo docker exec -it <container_id> /bin/bash

#Step 3 Run upload to HDFS

cd /ipython
hadoop fs -put data /user/

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
ipython-notebooks		ipython-notebooks
python		python
scala		scala
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
personalRatings.txt		personalRatings.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

ipython-notebooks

ipython-notebooks

python

python

scala

scala

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

personalRatings.txt

personalRatings.txt

Repository files navigation

Latest examples are in the ipython-notebook folder

Running examples with provided docker container

About

Releases

Packages

Languages

License

codecraker/spark-examples-1

Folders and files

Latest commit

History

Repository files navigation

Latest examples are in the ipython-notebook folder

Running examples with provided docker container

About

Resources

License

Stars

Watchers

Forks

Languages