Provides a set of Ansible playbooks to deploy a Big Data analytics stack on top of Hadoop/Yarn.
The play-hadoop.yml
deploys the base system. Addons, such as Pig,
Spark, etc, are deployed using the playbooks in the addons
directory. A playbook for deploying all the addons is given in
play-alladdons.yml
.
- Analytics Layer
- BLAS
- LAPACK
- Mahout
- MLlib
- MLbase
- Java
- R+libraries
- Python
- Pandas
- Scikit-learn
- Data Processing Layer
- Hadoop MapReduce
- Spark
- Tez
- Hama
- Storm
- Hive
- Pig
- Flink
- Database Layer
- MongoDB
- CouchDB
- HBase
- MySQL
- PostgreSQL
- Memcached
- Redis
- Scheduling:
- YARN
- Mesos
- Storage
- HDFS
- Monitoring
- Ganglia
-
Download this repository using
git clone --recursive
. -
Install the requirements using
pip install -r requirements.txt
-
Edit
.cluster.py
to define the machines in the cluster. -
Launch the cluster using
vcl boot -p openstack -P $USER-
This will start the machines on whatever openstack environment is currently available (via the$OS_PROJECT_NAME
,$OS_AUTH_URL
, etc), prefixing$USER-
to the name of each VM (eg.zk0
becomesbadi-zk0
). -
Make sure that
ansible.cfg
reflects your environment. Look especially atremote_user
if you are not using Ubuntu. -
Ensure
ssh_bastion_config
is to your liking (it assumes you are using the openstack cluster on FutureSystems). -
Run
ansible all -m ping
to make sure all nodes can be managed. -
Define
zookeeper_id
for each zookeeper node. Adapt the following:mkdir host_vars for i in 0 1 2; do echo "zookeeper_id: $(( i+1 ))" >host_vars/zk$i` done
-
Run
ansible-playbook plays-hadoop.yml
to install the base system -
Run
ansible-playbook addons/{pig,spark}.yml # etc
to install the Pig and Spark addons.
Please see the LICENSE
file in the root directory of the repository.
- Fork the repository
- Add yourself to the
CONTRIBUTORS.yml
file - Submit a pull request to the
unstable
branch