An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers

This repository contains the source code of the algorithm presented in paper An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers.

The code implements an Estimation of Distribution Algorithm (EDA), a kind of evolutionary algorithm, to optimize hyper-parameters of a set of base classifiers, namely J48, CART, PART, RIPPER, and Decision Table.

All base classifiers are Weka implementations. The EDA is implemented in Python.

If you find this paper useful, please cite in your work:

@incollection{cagnini2020pbil,
  title = {An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers},
  author = {Henry E. L. Cagnini and Alex A. Freitas and Rodrigo C. Barros},
  year = {2020},	
  publisher = {Springer},	
  pages = {18--33},	
  booktitle = {Intelligent Systems},
  doi = {10.1007/978-3-030-61377-8_2},	
  url = {https://doi.org/10.1007%2F978-3-030-61377-8_2}
}

Installation

This repository uses Anaconda as the default Python. You can download Anaconda here.

Follow these steps to set up the development environment for the algorithm:

Create a new conda environment: conda create --name pbil
Activate it: conda activate pbil
Install libraries: conda install --file installation/conda_libraries.txt -c conda-forge
Install graphviz:

i. On Ubuntu 16.04: apt-get install graphviz libgraphviz-dev pkg-config

ii. On Windows: conda install -c alubbock pygraphviz=1.5 graphviz=2.41, then add path to graphviz installation to PATH variable: <folder_to_anaconda_installation>/Anaconda3/pkgs/graphviz-2.41-0/Scripts
Install JRE and JDK. The correct JDK version is jdk-8u221-linux-x64.tar.gz. Tutorial available here.
Install pip libraries: pip install -r installation/pip_libraries.txt (NOTE: this might require installing Visual Studio with Python tools on Windows)
Replace Weka from python-weka-wrapper library with provided Weka. This is needed since SimpleCart is not provided with default Weka. On Weka, simply installing it as an additional package makes it available in the GUI; however the wrapper still won't see it.

i. On Ubuntu: cp installation/weka.jar <folder_to_anaconda_installation>/anaconda3/envs/pbil/lib/python3.7/site-packages/weka/lib/ ii. On Windows: copy installation/weka.jar <folder_to_anaconda_installation>/Anaconda3/envs/pbil/Lib/site-packages/weka/lib/

Unpack datasets.

a. On Ubuntu:

 i. Install 7zip: `apt-get install p7zip-full`
 ii. Create folder for datasets: `mkdir keel_datasets_10fcv`  
 iii. Unpack it: `7z x installation/keel_datasets_10fcv.7z -o.`

b. On Windows:

 i. Install 7zip: https://www.7-zip.org/download.html
 ii. Add path to installation to PATH variable: `C:\Program Files\7-Zip`
 iii. Create folder for datasets: `mkdir keel_datasets_10fcv`
 iv. Unpack it: `7z x installation\keel_datasets_10fcv.7z -o.`

Create metadata path: mkdir metadata

Running

Running the experiments in the paper

Once installed, you can call the script main.py in root folder to run the tests that were used in the paper. A successful call to the algorithm in the command line looks as follows:

python main.py --datasets-path keel_datasets_10fcv --datasets-names german --metadata-path metadata 
--resources-path resources --n-jobs 1 --heap-size 4g --n-generations 2 --n-individuals 10 --n-samples 1 
--learning-rate 0.5 --selection-share 0.5

Just using the algorithm as-is

On the other hand, if you do not want to reproduce the experiments presented in the paper, you can use the algorithm as-is:

from pbil.model import PBIL
from utils import read_datasets
from weka.core import jvm

jvm.start(max_heap_size='4g')  # using 4GB of heap size for JVM

train_data = read_datasets('keel_datasets_10fcv\\sonar\\sonar-10-1tra.arff')
test_data = read_datasets('keel_datasets_10fcv\\sonar\\sonar-10-1tst.arff')

pbil = PBIL(resources_path='resources', train_data=train_data, n_generations=2, n_individuals=10)
overall, last = pbil.run(1)
print(overall.predict(test_data))
jvm.stop()

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
figures		figures
installation		installation
pbil		pbil
resources		resources
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
DATASETS.md		DATASETS.md
LICENSE		LICENSE
README.md		README.md
keel_datasets_10fcv.7z		keel_datasets_10fcv.7z
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

installation

installation

pbil

pbil

resources

resources

utils

utils

.gitattributes

.gitattributes

.gitignore

.gitignore

DATASETS.md

DATASETS.md

LICENSE

LICENSE

README.md

README.md

keel_datasets_10fcv.7z

keel_datasets_10fcv.7z

main.py

main.py

Repository files navigation

An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers

Installation

Running

Running the experiments in the paper

Just using the algorithm as-is

About

Releases

Packages

Languages

License

henryzord/PBIL

Folders and files

Latest commit

History

Repository files navigation

An Evolutionary Algorithm for Learning Interpretable Ensembles of Classifiers

Installation

Running

Running the experiments in the paper

Just using the algorithm as-is

About

Resources

License

Stars

Watchers

Forks

Languages