UPSG: The Universal Pipeline for Social Good

Introduction

UPSG is a standard methodology, an interchange format, and a Python library for writing machine learning pipelines.

It is designed primarily to provide different teams working on different machine learning problems a way to share code across different languages and environments.

Installation

install with:

pip install git+git://github.com/dssg/UPSG.git

To use the UPSG Python library, we currently require the following packages. In most environments, pip should take care of this for you.

Required

Python packages

Other packages

HDF5

Optional

Python packages

Other packages

Graphviz

Example

This is how to implement the sklearn "Getting started" pipeline:

from sklearn import datasets
from sklearn.svm import SVC

from upsg.fetch.np import NumpyRead
from upsg.wrap.wrap_sklearn import wrap_and_make_instance
from upsg.export.csv import CSVWrite
from upsg.transform.split import SplitTrainTest
from upsg.pipeline import Pipeline

digits = datasets.load_digits()
digits_data = digits.data
# for now, we need a column vector rather than an array
digits_target = digits.target

p = Pipeline()

# load data from a numpy dataset
stage_data = NumpyRead(digits_data)
stage_target = NumpyRead(digits_target)

# train/test split
stage_split_data = SplitTrainTest(2, test_size=1, random_state=0)

# build a classifier
stage_clf = wrap_and_make_instance(SVC, gamma=0.001, C=100.)

# output to a csv
stage_csv = CSVWrite('out.csv')

node_data, node_target, node_split, node_clf, node_csv = map(
    p.add, [
        stage_data, stage_target, stage_split_data, stage_clf,
        stage_csv])

# connect the pipeline stages together
node_data['output'] > node_split['input0']
node_target['output'] > node_split['input1']
node_split['train0'] > node_clf['X_train']
node_split['train1'] > node_clf['y_train']
node_split['test0'] > node_clf['X_test']
node_clf['y_pred'] > node_csv['input']

p.run()

# results are now in out.csv

Next Steps

Check out the documentation

Name		Name	Last commit message	Last commit date
Latest commit History 343 Commits
doc		doc
tests		tests
upsg		upsg
.gitignore		.gitignore
.nojekyll		.nojekyll
README.rst		README.rst
index.html		index.html
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

tests

tests

upsg

upsg

.gitignore

.gitignore

.nojekyll

.nojekyll

README.rst

README.rst

index.html

index.html

setup.py

setup.py

Repository files navigation

Introduction

Installation

Required

Python packages

Other packages

Optional

Python packages

Other packages

Example

Next Steps

About

Releases

Packages

Languages

macressler/UPSG

Folders and files

Latest commit

History

Repository files navigation

UPSG: The Universal Pipeline for Social Good

Introduction

Installation

Required

Python packages

Other packages

Optional

Python packages

Other packages

Example

Next Steps

About

Resources

Stars

Watchers

Forks

Languages