sklearn-SeCo

Implementation of the Separate and Conquer or Covering-Algorithm for scikit-learn. This is a classifier learning a collection (theory) of human-comprehensible rules.

sklearn_seco aims to be

computationally fast,
open to algorithm modification (think e.g. new heuristics),
yet still comprising of understandable code.

It was developed as a masters thesis at the Knowledge Engineering Group at TU Darmstadt, under supervision of Prof. Johannes Fürnkranz. Check the thesis document (see the releases page) for additional documentation, including evaluation plots.

Testing / Evaluation

For current test suite results, check Continuous Integration.

To run a comparison of sklearn_seco.RipperEstimator with weka.JRip, weka.J48, and sklearn.dtree on a selected collection of UCI datasets, run python3 evaluation.py.

Installation

Required are python >= 3.6 and the packages defined in setup.py. If needed, a known-working list of versioned dependencies is pinned in requirements.txt.

If you want to work on the code, check out the repository:
```
git clone https://github.com/azrdev/sklearn-seco
```
and run directly from the working tree, or make an editable install:
```
pip install --update -e "/path/to/sklearn-seco[numba]"
```

If you want to use the project without modifying it, install the latest master with

pip install --update "git+https://github.com/azrdev/sklearn-seco#egg=sklearn_seco[numba]"

TODO: publish on pypi so getting a stable version with pip install "sklearn-seco[numba]" is possible.

Note that for the speed-optimized matching you need to install numba, and for the coverage plots in extra.py and tests/test_extra.py you need matplotlib. The former is installed above, through specified "extra" dependency sets "numba" and "tests".

Development status

abstract seco is implemented and usable, the compatibility test sklearn.utils.estimator_checks.check_estimator() succeeds completely for SimpleSeCo and CN2
CN2 has not been thoroughly compared to original code or the Orange implementation, but should be complete
Ripper lacks the original class binarization strategy and the global post-optimization, therefore results are not identical to JRip (the only other freely available implementation).
testsuite has a few consistent failures (all in test_concrete.py):
- test_perfectly_correlated_categories_multiclass[CN2Estimator] fails because CN2 learns only one rule when it gets to x=[[… 2 0] [… 0 1]], y=[2 1] which is perfectly valid behavior.
- test_sklearn_check_estimator[IrepEstimator] fails sklearn.utils.estimator_checks.check_classifier_data_not_an_array (NOTE: maybe later tests in check_estimator fail, too) because: if default rule is better than any refinement and survives rule_stopping_criterion, the learned theory is [init_rule()]. We consider this an error and raise an exception , maybe stopping criteria are buggy that they let the default rule through.
- test_sklearn_check_estimator[RipperEstimator] fails sklearn.utils.estimator_checks.check_classifiers_train due to scikit-learn/scikit-learn#14124 (NOTE: maybe later tests in check_estimator fail, too)
- pypy test runs timeout on Travis-CI after 10 minutes.
various TODOs throughout the code mark missing details and/or ideas for improvement (of functionality or runtime performance)

Name		Name	Last commit message	Last commit date
Latest commit History 273 Commits
sklearn_seco		sklearn_seco
.flake8		.flake8
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
seco_line_profiling.py		seco_line_profiling.py
seco_runtime_scaling.py		seco_runtime_scaling.py
setup.py		setup.py
weka_iris.sh		weka_iris.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn_seco

sklearn_seco

.flake8

.flake8

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

evaluation.py

evaluation.py

pytest.ini

pytest.ini

requirements.txt

requirements.txt

seco_line_profiling.py

seco_line_profiling.py

seco_runtime_scaling.py

seco_runtime_scaling.py

setup.py

setup.py

weka_iris.sh

weka_iris.sh

Repository files navigation

sklearn-SeCo

Testing / Evaluation

Installation

Development status

About

Releases 1

Packages

Contributors 2

Languages

License

azrdev/sklearn-seco

Folders and files

Latest commit

History

Repository files navigation

sklearn-SeCo

Testing / Evaluation

Installation

Development status

About

Topics

Resources

License

Stars

Watchers

Forks

Languages