MODEL-PIPELINES

This is my personal repository to build pipelines that preprocess data, train models, make inference, and are able to be productionized using AWS or GCP. I set up it so that it can work with different ML API's such as scikit-learn, pytorch, xgboost, etc. and multiple types of datasets.

It's a WIP so I'll be regularly changing things to reduce complexity, deal with bugs, and allow the package to generalize to more types of ML problems. My end goal is for this package to be starter code for others to use and integrate into their own projects. Happy modeling!

Using inspiration from https://github.com/bgweber and https://github.com/abhishekkrthakur.

Setup

Create a virtual environment using conda, virtualenv, virtualenvwrapper, etc. then pip install the requirements. For example:

cd model_factory
conda create -n model_pipelines python=3.6
conda activate model_pipelines
conda install pytorch torchvision -c pytorch -y
pip install -r requirements.txt

Download a dataset and store it in inputs for example:`
- mkdir inputs && cd inputs
- mkdir quora_question_pairs && cd quora_question_pairs
- kaggle competitions download -c quora-question-pairs
Train a model and set MODEL_PATH - e.g. export MODEL_PATH=trained_models/<model-name>:
- mkdir trained_models
Set up connections to AWS or GCP. This step is a little more involved so checkout the documentation:
- AWS: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html#cli-quick-configuration
- GCP: https://cloud.google.com/sdk/gcloud/reference/auth/login

Training Models

This API is still a WIP so it will probably change.

Setup your CrossValidator object and create training and validation folds.
Setup your DataSet object by pointing to your dataset - e.g. inputs/ - and decide on what fold you want you use.
Setup your Model by wrapping an sklearn, xgboost, pytorch, or keras model into a Model object that we will use inside of a Trainer.
Trainers group training and inference functionality to abstract away a lot of detail and get you going. They have the ability to load/save trained models to/from s3 buckets (GCP coming soon). More functionality will be added to simplify things.
Create your Engine object and instantiate your model, pass it into your trainer, and pass the trainer to your Engine object. The Engine handles training using the engine.run_training_engine() method and inference using engine.run_inference_engine(). Look at the docstrings to see the required arguments.
Optional: setup a credentials dictionary containing your AWS login information. Doing this will allow you to save and load models from and to an s3 bucket.

Webapps

Deploy web application locally

cd deployments/webapp
sh bash_scripts/run-app.sh

Pipelines

Im working on integrating Airflow into the package so that pipelines can be orchestrated and deployed on a kubernetes cluster. More coming soon!

Run example pipeline

cd deployments/pipelines/example_pipeline
Setup pipeline-creds.json containing all of your GCP credentials info.
Setup environmental variables:
- export PROJECT_ID=<project-id>
- export IMAGE_NAME=<image-name>
- export CREDS=<path-to-creds-file>
Run the setup scripts:
- sh set-up-creds.sh && sh push-to-gcr.io

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
model_factory		model_factory
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_factory

model_factory

notebooks

notebooks

.gitattributes

.gitattributes

.gitignore

.gitignore

.pre-commit-config.yaml

.pre-commit-config.yaml

README.md

README.md

Repository files navigation

MODEL-PIPELINES

Setup

Training Models

Webapps

Deploy web application locally

Pipelines

Run example pipeline

About

Releases

Packages

Contributors 3

Languages

aponte411/model_pipelines

Folders and files

Latest commit

History

Repository files navigation

MODEL-PIPELINES

Setup

Training Models

Webapps

Deploy web application locally

Pipelines

Run example pipeline

About

Resources

Stars

Watchers

Forks

Languages