Skip to content

mengxr/joblib-spark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Joblib spark backend

This is joblib spark backend.

A Note About Dependency

You need joblib >= 0.14 If you want slearn to use spark backend, you need upgrade sklearn version to >= 0.21

You need install pyspark first. Joblib-spark support spark version >= 2.4.4

Installation

prerequisite

  1. Install python library:
pip install scikit-learn==0.21.3
pip install joblib==0.14.0
  1. Install pyspark

Install joblib-spark

cd path/to/joblib-spark
python setup.py install

Examples

Run following example code in pyspark shell:

from sklearn.utils import parallel_backend
from sklearn.model_selection import cross_val_score
from sklearn import datasets
from sklearn import svm
from joblibspark import register_spark

register_spark() # register spark backend

iris = datasets.load_iris()
clf = svm.SVC(kernel='linear', C=1)
with parallel_backend('spark', n_jobs=3):
  scores = cross_val_score(clf, iris.data, iris.target, cv=5)

print(scores)

About

Joblib spark backend

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 67.4%
  • Shell 32.6%