Big Data Analytics Stack

Provides a set of Ansible playbooks to deploy a Big Data analytics stack on top of Hadoop/Yarn.

The play-hadoop.yml deploys the base system. Addons, such as Pig, Spark, etc, are deployed using the playbooks in the addons directory. A playbook for deploying all the addons is given in play-alladdons.yml.

Stack

Analytics Layer
- BLAS
- LAPACK
- Mahout
- MLlib
- MLbase
- Java
- R+libraries
- Python
  - Pandas
  - Scikit-learn
Data Processing Layer
- Hadoop MapReduce
- Spark
- Tez
- Hama
- Storm
- Hive
- Pig
- Flink
Database Layer
- MongoDB
- CouchDB
- HBase
- MySQL
- PostgreSQL
- Memcached
- Redis
Scheduling:
- YARN
- Mesos
Storage
- HDFS
Monitoring
- Ganglia

Usage

Download this repository using git clone --recursive.
Install the requirements using pip install -r requirements.txt
Edit .cluster.py to define the machines in the cluster.
Launch the cluster using vcl boot -p openstack -P $USER- This will start the machines on whatever openstack environment is currently available (via the $OS_PROJECT_NAME, $OS_AUTH_URL, etc), prefixing $USER- to the name of each VM (eg. zk0 becomes badi-zk0).
Make sure that ansible.cfg reflects your environment. Look especially at remote_user if you are not using Ubuntu.
Ensure ssh_bastion_config is to your liking (it assumes you are using the openstack cluster on FutureSystems).
Run ansible all -m ping to make sure all nodes can be managed.

Define zookeeper_id for each zookeeper node. Adapt the following:

mkdir host_vars
for i in 0 1 2; do
  echo "zookeeper_id: $(( i+1 ))" >host_vars/zk$i`
done

Run ansible-playbook plays-hadoop.yml to install the base system
Run ansible-playbook addons/{pig,spark}.yml # etc to install the Pig and Spark addons.

License

Please see the LICENSE file in the root directory of the repository.

Contributing

Fork the repository
Add yourself to the CONTRIBUTORS.yml file
Submit a pull request to the unstable branch

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
addons		addons
base		base
roles		roles
.cluster.py		.cluster.py
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTORS.yml		CONTRIBUTORS.yml
LICENSE		LICENSE
README.md		README.md
ansible.cfg		ansible.cfg
play-alladdons.yml		play-alladdons.yml
play-hadoop.yml		play-hadoop.yml
requirements-frozen.txt		requirements-frozen.txt
requirements.txt		requirements.txt
roles.txt		roles.txt
ssh_bastion_config		ssh_bastion_config

License

cloudmesh-ansible/big-data-stack

Folders and files

Latest commit

History

Repository files navigation

Big Data Analytics Stack

Stack

Usage

License

Contributing

Stack Components

About

Resources

License

Stars

Watchers

Forks

Languages