octohaven

Super simple Apache Spark job server.

Latest release: v1.1.0

Overview
Dependencies
Install and run
Quick test
Application logs
Spark job logs
Configuration
Build and test
Create release
Contribute

Overview

Simple Spark job scheduler, allows to run created, delayed or periodic jobs with selected jar file and Spark configuration/job attributes, keeps history of all jobs launched.

Features:

templates to quickly create jobs
delayed jobs (run after specific time period)
periodic jobs (run jobs periodically using timetables and Cron expressions)
view stdout/stderr of jobs using UI and etc.
does not mess up with Spark cluster installation and/or scripts (more like nice feature, which you can easily turn on/off any time)
Tested only with Spark standalone cluster, not sure if it will work with Yarn or Mesos

Dependencies

OS X or Linux, octohaven does not support Windows, yet
Python 2.7, technically it can run on any >2.3 distributions, but 2.7 is forced currently
octohaven has to run side-by-side with Apache Spark (on the same machine/vm). Usually I use it on the same VM where Spark master is.

See Configuration on how to configure octohaven.

Install and run

Download one of the distributions octohaven-x.y.z.tar.gz or octohaven-x.y.z.zip, unpack archive, and edit a few configuration parameters in conf/octohaven-env.sh (see Configuration).

$ tar xzvf octohaven-1.0.0.tar.gz
$ vi conf/octohaven-env.sh

Launch application:

$ sbin/start.sh

start.sh provides options:

--daemon, -d launch service as daemon process, e.g. --daemon=true/false
--help display usage of the script
--test, -t launch service in test mode, e.g. --test
--python provide different location of PYTHON_EXE, default is /usr/bin/python

Note that start.sh will also launch and manage docker container for you, if you have chosen to use docker in configuration (recommended).

To stop application, this will also try to stop docker container, if docker is used with octohaven.

$ sbin/stop.sh

Quick test

Once octohaven is running, it will show you current availability of the Spark cluster and list history of jobs that you have run (should be empty). You can run quick test to see how it works. Create job with settings (note, that you might need to change jar folder directly, or you can copy jar into your directory):

entrypoint: org.test.SparkSum
jar: test/resources/filelist/prod/start-sbt-app_2.10-0.0.1.jar
job options: any number up to max integer

Job will report sum of numbers between 0 and number specified. You could also try viewing stdout and stderr during progress of the job.

Application logs

octohaven stores application logs in current project directory, and uses conf/log.conf configuration for logging. You can specify different location or settings in that file.

Spark job logs

octohaven also stores logs (stdout and stderr) produced during run of the job. They are stored in working directory, which defaults to work/ in project directory. You can configure it in conf/octohaven-env.sh.

Configuration

All configuration is in conf/octohaven-env.sh. Available options are listed below (also well-documented in the configuration file):

OCTOHAVEN_HOST, OCTOHAVEN_PORT host and port for the service
OCTOHAVEN_SPARK_MASTER_ADDRESS, OCTOHAVEN_SPARK_UI_ADDRESS
JAR_FOLDER starting folder/root, it will traverse directory to look for jar files
WORKING_DIR working directory, where job logs are stored, defaults to work/
NUM_SLOTS number of slots, defines number of jobs allowed to be launched or running at the same time. This includes all jobs launched by application as well as Spark cluster, defaults to 1
MYSQL_HOST, MYSQL_PORT, MYSQL_USER, MYSQL_PASSWORD, MYSQL_DATABASE MySQL settings to access provided database. Note that if you choose to use docker, you do not need to change parameters, it will work out of the box (unless you want to change passwords, etc.)
USE_DOCKER whether or not use docker container to store data, if yes, it will automatically pull image and launch container with MySQL settings above.
OCTOHAVEN_CONTAINER_NAME name of the docker container to launch

Build and test

To build the project you need to setup virtual environment first, it is recommended to use venv folder, since bin scripts have nice wrappers for this.

$ git clone https://github.com/sadikovi/octohaven
$ cd octohaven
$ virtualenv venv

Docker is used for development and running tests. Make sure that you invoke this to setup docker-machine (if available), and docker container.

$ make docker-start

This will start default VM, if docker-machine exists, and launch test container. This is pretty much the entry point to work on octohaven. You can use make docker-stop to shutdown container, and/or docker-machine.

Clean current directory, e.g. remove dependencies, *.pyc, *.log, distribution files, etc.

$ make clean

Build dependencies and source files use (mostly when working on front-end):

$ make build

Building coffee and SCSS files requires coffee, sass, and uglifyjs. Script will warn you, if you do not have either of these packages, and suggest how to install them.

$ gem install sass
$ npm install coffee-script
$ npm install uglifyjs

Actually start service. Assumes that you already have run make docker-start.

$ make start

This will start service with default parameters in non-daemon mode, should work in any environment. Just make sure to look into makefile and tweak it for yourself, currently there is not much of automation, so you might need to change manually MySQL host/container host in connection string. To stop service use Ctrl-C in terminal.

Run unit tests. Assumes that you already have run make docker-start.

$ make test

Use bin/python and bin/pip to use python and pip respectively to use virtual environment installation.

Create release

To create release follow these steps. Currently there are some manual interventions, but I will automate it as much as I can later.

# 1. Launch docker container
$ make docker-start

# 2. Update version in 'version.py', 'package.json', 'bower.json'
$ bin/update-version --version=x.y.z

# 3. Change logging and debugging mode in 'log.conf', 'internal.py', if necessary
$ bin/set-testing-mode --testing=false

# 4. Change README latest release link

# 5. Make distribution:
# Clean project directory, download dependencies, build source files, run unit-tests,
# create zip and tar archives
$ make dist

# 6. Commit changes into GitHub
$ git add --all
$ git commit -m "release version x.y.z"
$ git push

# 7. Create release/tag on GitHub
# Also upload archives from 'dist' folder as binaries for new release/tag

# 8. Pull changes, and turn dev mode on:
# update next version in 'version.py', 'package.json', 'bower.json'
$ bin/update-version --version=x.y.z

# update logging and debugging in 'log.conf', 'internal.py' if applicable
$ bin/set-testing-mode --testing=true

$ git pull
$ git commit -m "set up next version dev mode"
$ git push

Contribute

Any suggestions, features, issues and PRs are very welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
bin		bin
conf		conf
lib		lib
sbin		sbin
src		src
static		static
templates		templates
test		test
work		work
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
bower.json		bower.json
internal.py		internal.py
makefile		makefile
package.json		package.json
requirements.txt		requirements.txt
setup.py		setup.py
version.py		version.py

License

sadikovi/octohaven

Folders and files

Latest commit

History

Repository files navigation

octohaven

Overview

Dependencies

Install and run

Quick test

Application logs

Spark job logs

Configuration

Build and test

Create release

Contribute

About

Resources

License

Stars

Watchers

Forks

Languages