Skip to content

cloudmesh-ansible/big-data-stack

 
 

Repository files navigation

Big Data Analytics Stack

Provides a set of Ansible playbooks to deploy a Big Data analytics stack on top of Hadoop/Yarn.

The play-hadoop.yml deploys the base system. Addons, such as Pig, Spark, etc, are deployed using the playbooks in the addons directory. A playbook for deploying all the addons is given in play-alladdons.yml.

Stack

  • Analytics Layer
    • BLAS
    • LAPACK
    • Mahout
    • MLlib
    • MLbase
    • Java
    • R+libraries
    • Python
      • Pandas
      • Scikit-learn
  • Data Processing Layer
    • Hadoop MapReduce
    • Spark
    • Tez
    • Hama
    • Storm
    • Hive
    • Pig
    • Flink
  • Database Layer
    • MongoDB
    • CouchDB
    • HBase
    • MySQL
    • PostgreSQL
    • Memcached
    • Redis
  • Scheduling:
    • YARN
    • Mesos
  • Storage
    • HDFS
  • Monitoring
    • Ganglia

Usage

  1. Download this repository using git clone --recursive.

  2. Install the requirements using pip install -r requirements.txt

  3. Edit .cluster.py to define the machines in the cluster.

  4. Launch the cluster using vcl boot -p openstack -P $USER- This will start the machines on whatever openstack environment is currently available (via the $OS_PROJECT_NAME, $OS_AUTH_URL, etc), prefixing $USER- to the name of each VM (eg. zk0 becomes badi-zk0).

  5. Make sure that ansible.cfg reflects your environment. Look especially at remote_user if you are not using Ubuntu.

  6. Ensure ssh_bastion_config is to your liking (it assumes you are using the openstack cluster on FutureSystems).

  7. Run ansible all -m ping to make sure all nodes can be managed.

  8. Define zookeeper_id for each zookeeper node. Adapt the following:

    mkdir host_vars
    for i in 0 1 2; do
      echo "zookeeper_id: $(( i+1 ))" >host_vars/zk$i`
    done
    
  9. Run ansible-playbook plays-hadoop.yml to install the base system

  10. Run ansible-playbook addons/{pig,spark}.yml # etc to install the Pig and Spark addons.

License

Please see the LICENSE file in the root directory of the repository.

Contributing

  1. Fork the repository
  2. Add yourself to the CONTRIBUTORS.yml file
  3. Submit a pull request to the unstable branch

Stack Components

About

Hadoop-based Big Data stack (hdfs, yarn, spark, etc)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%