learningOrchestra

learningOrchestra facilitates and streamlines iterative processes in a Data Science project pipeline like:

Data Gathering
Data Cleaning
Model Building
Validating the Model
Presenting the Results

With learningOrchestra, you can:

load a dataset from an URL (in CSV format).
accomplish several pre-processing tasks with datasets.
create highly customised model predictions against a specific dataset by providing their own pre-processing code.
build prediction models with different classifiers simultaneously using a spark cluster transparently.

And so much more! Check the usage section for more.

Installation

Requirements

Linux hosts
Docker Engine must be installed in all instances of your cluster
Cluster configured in swarm mode, check creating a swarm
Docker Compose must be installed in the manager instance of your cluster

Ensure that your cluster environment does not block any traffic such as firewall rules in your network or in your hosts.

If in case, you have firewalls or other traffic-blockers, add learningOrchestra as an exception.

Ex: In Google Cloud Platform each of the VMs must allow both http and https traffic.

Deployment

In the manager Docker swarm machine, clone the repo using:

git clone https://github.com/riibeirogabriel/learningOrchestra.git

Navigate into the learningOrchestra directory and run:

cd learningOrchestra
sudo ./run.sh

That's it! learningOrchestra has been deployed in your swarm cluster!

Cluster State

CLUSTER_IP:80 - To visualize cluster state (deployed microservices and cluster's machines). CLUSTER_IP:8080 - To visualize spark cluster state.

* CLUSTER_IP is the external IP of a machine in your cluster.

Usage

learningOrchestra can be used with the Microservices REST API or with the learning-orchestra-client Python package.

Microservices REST APIs

Database API- Download and handle datasets in a database.

Projection API- Make projections of stored datasets using Spark cluster.

Data type API- Change dataset fields type between number and text.

Histogram API- Make histograms of stored datasets.

t-SNE API- Make a t-SNE image plot of stored datasets.

PCA API- Make a PCA image plot of stored datasets.

Model builder API- Create a prediction model from pre-processed datasets using Spark cluster.

Spark Microservices

The Projection, t-SNE, PCA and Model builder microservices uses the Spark microservice to work.

By default, this microservice has only one instance. In case your data processing requires more computing power, you can scale this microservice.

To do this, with learningOrchestra already deployed, run the following in the manager machine of your Docker swarm cluster:

docker service scale microservice_sparkworker=NUMBER_OF_INSTANCES

* NUMBER_OF_INSTANCES is the number of Spark microservice instances which you require. Choose it according to your cluster resources and your resource requirements.

Database GUI

NoSQLBooster- MongoDB GUI performs several database tasks such as file visualization, queries, projections and file extraction to CSV and JSON formats. It can be util to accomplish some these tasks with your processed dataset or get your prediction results.

Read the Database API docs for more info on configuring this tool.

See the full docs for detailed usage instructions.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Gabriel Ribeiro}
💻 🚇 📆 🚧

_{Navendu Pottekkat}
📖 🎨 🤔

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 713 Commits
.github		.github
microservices		microservices
.all-contributorsrc		.all-contributorsrc
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
learning-orchestra.png		learning-orchestra.png
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

microservices

microservices

.all-contributorsrc

.all-contributorsrc

.gitattributes

.gitattributes

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml