RuleFit

Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

You can use rulefit for predicting a numeric response (categorial not yet implemented). The input has to be a numpy matrix with only numeric values.

Installation

The latest version can be installed from the master branch using pip:

pip install git+git://github.com/christophM/rulefit.git

Another option is to clone the repository and install using python setup.py install or python setup.py develop.

Usage

Train your model:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=X.columns)

If you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

Predict

rf.predict(X)

Inspect rules:

rules = rf.get_rules()

rules = rules[rules.coef != 0].sort("support")

print rules

Notes

In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time
This implementation is in progress. If you find a bug, don't hesitate to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
rulefit		rulefit
.gitignore		.gitignore
README.md		README.md
boston.csv		boston.csv
example.py		example.py
setup.py		setup.py
test_rulefit.py		test_rulefit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rulefit

rulefit

.gitignore

.gitignore

README.md

README.md

boston.csv

boston.csv

example.py

example.py

setup.py

setup.py

test_rulefit.py

test_rulefit.py

Repository files navigation

RuleFit

Installation

Usage

Train your model:

Predict

Inspect rules:

Notes

About

Releases

Packages

Languages

Volodymyrk/rulefit

Folders and files

Latest commit

History

Repository files navigation

RuleFit

Installation

Usage

Train your model:

Predict

Inspect rules:

Notes

About

Resources

Stars

Watchers

Forks

Languages