Skip to content

Java library and command-line application for converting Scikit-Learn pipelines to PMML

License

Notifications You must be signed in to change notification settings

dclong/jpmml-sklearn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JPMML-SkLearn

Java library and command-line application for converting Scikit-Learn models to PMML.

Features

Prerequisites

The Python side of operations

Python installation can be validated as follows:

import sklearn, sklearn.externals.joblib, sklearn_pandas, sklearn2pmml

print(sklearn.__version__)
print(sklearn.externals.joblib.__version__)
print(sklearn_pandas.__version__)
print(sklearn2pmml.__version__)

The JPMML-SkLearn side of operations

  • Java 1.7 or newer.

Installation

Enter the project root directory and build using Apache Maven:

mvn clean install

The build produces an executable uber-JAR file target/converter-executable-1.4-SNAPSHOT.jar.

Usage

A typical workflow can be summarized as follows:

  1. Use Python to train a model.
  2. Serialize the model in pickle data format to a file in a local filesystem.
  3. Use the JPMML-SkLearn command-line converter application to turn the pickle file to a PMML file.

The Python side of operations

Load data to a pandas.DataFrame object:

import pandas

iris_df = pandas.read_csv("Iris.csv")

First, instantiate a sklearn_pandas.DataFrameMapper object, which performs data column-wise feature engineering and selection work:

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler
from sklearn2pmml.decoration import ContinuousDomain

iris_mapper = DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], [ContinuousDomain(), StandardScaler()])
])

Second, instantiate any number of Transformer and Selector objects, which perform dataset-wise feature engineering and selection work:

from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

iris_pca = PCA(n_components = 3)
iris_selector = SelectKBest(k = 2)

Third, instantiate an Estimator object:

from sklearn.tree import DecisionTreeClassifier

iris_classifier = DecisionTreeClassifier(min_samples_leaf = 5)

Combine the above objects into a sklearn2pmml.PMMLPipeline object, and run the experiment:

from sklearn2pmml import PMMLPipeline

iris_pipeline = PMMLPipeline([
    ("mapper", iris_mapper),
    ("pca", iris_pca),
    ("selector", iris_selector),
    ("estimator", iris_classifier)
])
iris_pipeline.fit(iris_df, iris_df["Species"])

Store the fitted sklearn2pmml.PMMLPipeline object in pickle data format:

from sklearn.externals import joblib

joblib.dump(iris_pipeline, "pipeline.pkl.z", compress = 9)

Please see the test script file main.py for more classification (binary and multi-class) and regression workflows.

The JPMML-SkLearn side of operations

Converting the pipeline pickle file pipeline.pkl.z to a PMML file pipeline.pmml:

java -jar target/converter-executable-1.4-SNAPSHOT.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

Getting help:

java -jar target/converter-executable-1.4-SNAPSHOT.jar --help

License

JPMML-SkLearn is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.

Additional information

Please contact info@openscoring.io

About

Java library and command-line application for converting Scikit-Learn pipelines to PMML

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 94.3%
  • Python 5.7%