H2O4GPU

H2O4GPU is a collection of GPU solvers by H2Oai. It builds upon the easy-to-use Scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn (i.e. import h2o4gpu as sklearn) with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls-back to CPU aglorithms when the GPU algorithm does not support an important existing Scikit-learn class option.

An R API is in developement and will be released as a stand-alone R package in the future.

Requirements

PC with Ubuntu 16.04+
Install CUDA with bundled display drivers ( CUDA 8 or CUDA 9)

When installing, choose to link the cuda install to /usr/local/cuda . Ensure to reboot after installing the new nvidia drivers.

Nvidia GPU with Compute Capability>=3.5 Capability Lookup.
For advanced features, like handling rows/32>2^16 in K-means, need Capability>=52

Installation

Add to ~/.bashrc or environment (set appropriate paths for your OS):

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64

Install OpenBlas dev environment:

sudo apt-get install libopenblas-dev

Download the Python wheel file (For Python 3.6 on linux_x86_64 with CUDA 8):

Start a fresh pyenv or virtualenv session.

Install the Python wheel file. NOTE: If you don't use a fresh environment, this will overwrite your py3nvml and xgboost installations to use our validated versions.

pip install h2o4gpu-0.0.4-py36-none-any.whl

Test your installation

import h2o4gpu
import numpy as np

X = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
model.cluster_centers_

Should give input/output of:

>>> import h2o4gpu
>>> import numpy as np
>>>
>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])
>>> model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
>>> model.cluster_centers_
array([[ 0.25,  0.  ],
       [ 1.  ,  4.  ]])

For more examples check our Jupyter notebook demos.

Running Jupyter Notebooks with Docker

#Build Docker image
make runtime

#Run docker image
To run: nvidia-docker run -p 8888:8888 -v /some/local/log:/log opsh2o4gpu/h2o4gpu-runtime &

This container has a /demos directory which contains Jupyter notebooks. You will need to make sure that port 8888 inside the container is exposed to reach it.

By default, the notebook is created with a token for security. You can find the token in the jupyter.log file:

cat /some/local/log/YYYYMMDD-HHMMSS/jupyter.log

...
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
    http://localhost:8888/?token=93f7d1fd17ff1942717656f5f8a43ce63ffcc135afc1475a
...

(Replace localhost and port 8888 with the IP address and host port where the container is exposed.)

Plans and RoadMap

The vision is to develop fast GPU algorithms to complement the CPU algorithms in scikit-learn while keeping full scikit-learn API compatibility and scikit-learn CPU algorithm capability. The h2o4gpu Python module is to be used as a drop-in-replacement for scikit-learn that has the full functionality of scikit-learn's CPU algorithms.

Functions and classes will be gradually overridden by GPU-enabled algorithms (unless n_gpu=0 is set and we have no CPU algorithm except scikit-learn's). The CPU algorithms and code initially will be sklearn, but gradually those may be replaced by faster open-source codes like those in Intel DAAL.

This vision is currently accomplished by using the open-source scikit-learn and xgboost and overriding scikit-learn calls with our own GPU versions. In cases when our GPU class is currently incapable of an important scikit-learn feature, we revert to the scikit-learn class.

As noted above, there is an R API in development, which will be released as a stand-alone R package. All algorithms supported by H2O4GPU will be exposed in both Python and R in the future.

Another primary goal is to support all operations the GPU via the GOAI initiative. This involves ensuring the GPU algorithms can take and return GPU pointers to data instead of going back to the host. In scikit-learn API language these are called fit_ptr, predict_ptr, transform_ptr, etc., where ptr stands for memory pointer.

Solver Classes

Among others, the solver can be used for the following classes of problems

GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation,
KMeans
Gradient Boosting Machine (GBM) via XGBoost

Planned:

GLM: Linear SVM, Huber Fitting, Total Variation Denoising, Optimal Control, Linear Programs and Quadratic Programs.
SVD, PCA

Benchmarks

Our benchmarking plan is to clearly highlight when modeling benefits from the GPU (usually complex models) or does not (e.g. one-shot simple models dominated by data transfer).

We have benchmarked h2o4gpu, scikit-learn, and h2o-3 on a variety of solvers. Some benchmarks have been performed for a few selected cases that highlight the GPU capabilities (i.e. compute or on-GPU memory operations dominate data transfer to GPU from host):

Benchmarks for GLM, KMeans, and XGBoost for CPU vs. GPU.

A suite of benchmarks are computed when doing "make testperf" from a build directory. These take all of our tests and benchmarks h2o4gpu against h2o-3. These will soon be presented as a live commit-by-commit streaming plots on a website.

Contributing

Please refer to our CONTRIBUTING.md and DEVEL.md for instructions on how to build and test the project and how to contribute. The h2o4gpu Gitter chatroom can be used for discussion related to open source development.

GitHub issues are used for bugs, feature and enhancement discussion/tracking.

Questions

Please ask all code-related questions on StackOverflow using the "h2o4gpu" tag.
Questions related to the roadmap can be directed to the developers on Gitter.
Troubleshooting
FAQ

References

Copyright

Copyright (c) 2017, H2O.ai, Inc., Mountain View, CA
Apache License Version 2.0 (see LICENSE file)


This software is based on original work under BSD-3 license by:

Copyright (c) 2015, Christopher Fougner, Stephen Boyd, Stanford University
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of the <organization> nor the
      names of its contributors may be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,598 Commits
.idea		.idea
LICENSES		LICENSES
cub @ b165e1f		cub @ b165e1f
data		data
examples		examples
h2o4gpu-docs-theme		h2o4gpu-docs-theme
make		make
presentations		presentations
py3nvml @ 7883efe		py3nvml @ 7883efe
scikit-learn @ d8c363f		scikit-learn @ d8c363f
scripts		scripts
src		src
tests		tests
tests_big		tests_big
tests_open		tests_open
tests_small		tests_small
testsxgboost		testsxgboost
tools		tools
xgboost @ 05dde15		xgboost @ 05dde15
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVEL.md		DEVEL.md
Dockerfile-build		Dockerfile-build
Dockerfile-runtime		Dockerfile-runtime
FAQ.md		FAQ.md
ISSUE_TEMPLATE.md		ISSUE_TEMPLATE.md
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Makefile		Makefile
PULL_REQUEST_TEMPLATE.md		PULL_REQUEST_TEMPLATE.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
license-grant.png		license-grant.png
requirements_buildonly.txt		requirements_buildonly.txt
requirements_runtime.txt		requirements_runtime.txt
roadmap.jpg		roadmap.jpg
run.sh		run.sh

License

Licenses found

wamsiv/h2o4gpu

Folders and files

Latest commit

History

Repository files navigation

H2O4GPU

Requirements

Installation

Running Jupyter Notebooks with Docker

Plans and RoadMap

Solver Classes

Benchmarks

Contributing

Questions

References

Copyright

About

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Languages