OpenStack Data Processing ("Sahara") project (Spark experimental fork)

This repository is a fork of the main OpenStack Sahara repo. This fork relates mainly to the Spark plugin development, with bug fixes, optimizations and updates related to the work of the Bigfoot project: http://bigfootproject.eu/

To use this version of Sahara, you will need images created with this fork of the image builder: https://github.com/bigfootproject/sahara-image-elements

The main changes from the standard Sahara are:

Support for more recent Spark versions, currently we are supporting Spark 1.5.0
Spark Notebook (https://github.com/andypetrella/spark-notebook) support. You can create a Spark cluster with notebooks already available and configured. Like iPython, but with Spark! The Spark Notebook is listed in the processes list when creating a new node group template. You can have at maximum one notebook process per cluster. Once the cluster has been started, a link to the notebook can be found at the bottom of the cluster information page.
Relaxed checks to let the user create HDFS-only and Spark-only clusters: this allows the concept of storage-only clusters, relatively static, and compute-only clusters that come and go.
Spark clusters can be configured with a default HDFS location
Data locality: by using the cluster-level "HDFS storage cluster" option a compute cluster will be co-located on the same physical hosts on which the datanodes for that storage cluster are found
Swift data source for Spark, with fixes for Spark 1.3
Smaller fixes and workarounds for bugs, while waiting for a proper fix in upstream Sahara

This repository is periodically merged with the upstream Sahara master branch.

Virtual Machine image

An image ready to be used with this version of Sahara is available here: https://drive.google.com/open?id=0B2TbBvh6BGVcZVZCbEFjOWdvNjQ

Contact us

This fork of Sahara is developed and maintained by the Distributed Systems Group at Eurecom (http://www.eurecom.fr).

Pietro Michiardi (http://www.eurecom.fr/~michiard/)
Daniele Venzano (http://www.eurecom.fr/en/people/venzano-daniele)

License

Apache License Version 2.0 http://www.apache.org/licenses/LICENSE-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3,827 Commits
doc/source		doc/source
etc		etc
sahara		sahara
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitreview		.gitreview
.mailmap		.mailmap
.testr.conf		.testr.conf
CONTRIBUTING.rst		CONTRIBUTING.rst
HACKING.rst		HACKING.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
babel.cfg		babel.cfg
openstack-common.conf		openstack-common.conf
pylintrc		pylintrc
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg
setup.py		setup.py
start_sahara.sh		start_sahara.sh
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini
upgrade_db_schema.sh		upgrade_db_schema.sh

License

YongchaoTIAN/sahara

Folders and files

Latest commit

History

Repository files navigation

OpenStack Data Processing ("Sahara") project (Spark experimental fork)

Virtual Machine image

Contact us

License

About

Resources

License

Stars

Watchers

Forks

Languages