Skip to content

Ennosigaeon/meta-learning-base

Repository files navigation

Meta-Learning Base

Installation

This project requires Python >= 3.6, Docker and docker-compose. It consists of three different python projects:

  • meta-learning-base
  • sklearn-components
sudo apt-get install libatlas-base-dev libblas3 liblapack3 liblapack-dev libblas-dev gfortran
sudo apt install python3-pip build-essential

Download other projects and install them
```bash
cd ..
git clone https://gitlab.usu-research.ml/research/automl/sklearn-components.git
sudo pip3 install -r sklearn-components/requirements.txt
pip3 install -e sklearn-components

Install system-packages

sudo apt install libpq-dev

Clone meta-learning-base project and install requirements

git clone http://gitlab.usu-research.ml/research/automl/meta-learning-base.git
pip3 install -r meta-learning-base/requirements.txt

Configure virtual memory to prevent OOM killer

sudo vim /etc/sysctl.conf

vm.overcommit_memory=2
vm.overcommit_ratio=100

sysctl -p

Add limbo configuration to assets/

copy file to VM with scp then move it in assets/

Running the applicaion

Pass configuration

python3 cli.py <COMMAND> --s3-config assets/s3.yaml --sql-config assets/sql.yaml

where can be either enter_data or worker. For additional configuration options run

python3 cli.py <COMMAND> -h

or take a look at the configuration. Example execution

mkdir data
mkdir logfile

screen
python3 cli.py worker --work-dir ./data --sql-config assets/sql.yaml --s3-config assets/s3.yaml --logfile ./logfiles/log1`

Using external storage and database

You can either use the provided docker-compose file to use an external database and S3 storage

docker-compose up

or you can configure an existing db. If you want to use S3 storage, you will have to provide a google service account file.

Stopping a worker

Get the pid of the worker via ps aux | grep cli.py and terminate the worker with SIGUSR1. On ubuntu this equals kill -10 <PID>. This will perform a graceful shutdown after the evaluation of the current algorithm is finished.

Exporting Regression Models

There exist two methods to export the results of the meta-learning-base. To export all datasets meta-features use

python3 cli.py export_datasets

This command creates a file export_datasets{CHUNK}.pkl_. For performance reasons, exports are chunked to 500,000 datasets.

To export all pipelines use

python3 cli.py export_pipelines

This commands recursively combines the two tables dataset and algorithm to reconstruct all evaluated pipelines. Please note, that, depending on the number of actual algorithms and datasets, this actions requires a significant amount of time.

Using the train_scaler.py script, all exported files are combined to train regression models on the expected pipeline performance.

Pretrained data

For simplicity, we directly provide a random forest regression model trained on all available data ready to use. Additionally, we provide database dumps of the evaluation of 30 datasets in the assets/defaults/ directory. We recommend to use a distinct schema foreach dump. Each dump creates filled table algorithm and dataset in the public schema. After import, you should move the public schema to a new schema.

psql -f 1461_bank-marketing.sql
psql -c "ALTER SCHEMA public RENAME TO 'd1461'"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages