sklearn-gbmi: scikit-learn gradient-boosting-model interactions

This distribution provides a Python module for computing Friedman and Popescu's H statistics, in order to look for interactions among variables in scikit-learn gradient-boosting models (http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting).

See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", Ann. Appl. Stat. 2:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.

Installation

pip install sklearn-gbmi

Usage

Given a scikit-learn gradient-boosting model gbm that has been fitted to a NumPy array or pandas data frame array_or_frame and a list of indices of columns of the array or columns of the data frame indices_or_columns, the H statistic of the variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h(gbm, array_or_frame, indices_or_columns)

Alternatively, the two-variable H statistic of each pair of variables represented by the elements of array_or_frame and specified by indices_or_columns can be computed via

from sklearn_gbmi import *

h_all_pairs(gbm, array_or_frame, indices_or_columns)

(Compared to iteratively calling h, calling h_all_pairs avoids redundant computations.)

indices_or_columns is optional, with default value 'all'. If it is 'all', then all columns of array_or_frame are used.

NaN is returned if a computation is spoiled by weak main effects and rounding errors.

H varies from 0 to 1. The larger H, the stronger the evidence for an interaction among the variables.

Example

See the Jupyter notebook example.ipynb (https://github.com/ralphhaygood/sklearn-gbmi/blob/master/example.ipynb) for a complete example of how to use the module.

Notes

Per Friedman and Popescu, only variables with strong main effects should be examined for interactions. Strengths of main effects are available as gbm.feature_importances_ once gbm has been fitted.
Per Friedman and Popescu, collinearity among variables can lead to interactions in gbm that are not present in the target function. To forestall such spurious interactions, check for strong correlations among variables before fitting gbm.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
sklearn_gbmi		sklearn_gbmi
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
example.ipynb		example.ipynb
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn_gbmi

sklearn_gbmi

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

MANIFEST.in

MANIFEST.in

README.md

README.md

example.ipynb

example.ipynb

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

sklearn-gbmi: scikit-learn gradient-boosting-model interactions

Installation

Usage

Example

Notes

About

Releases

Packages

Languages

License

jshea1820/sklearn-gbmi

Folders and files

Latest commit

History

Repository files navigation

sklearn-gbmi: scikit-learn gradient-boosting-model interactions

Installation

Usage

Example

Notes

About

Resources

License

Stars

Watchers

Forks

Languages