scikit-learn-speed

Continuous benchmark suite for the scikit-learn project.

Usage

In order to run the benchmarks on your own machine, please follow these steps.

Clone the repository somewhere, for example ~/code/scikit-learn-speed

Extract the datasets:

cd ~/code/scikit-learn-speed/benchmarks
tar jxvf data.tar.bz2

Create the configuration file ~/.vbench-skl. For example:

[setup]
repo_path = /Users/vene/code/scikit-learn
repo_url = git@github.com:scikit-learn/scikit-learn.git
db_path = /Users/vene/code/scikit-learn-speed/benchmarks/benchmarks.db
tmp_dir = /tmp/vb_sklearn

The values displayed above are hardcoded defaults, and they are used in case the configuration value doesn't exist, or to override skipped values. Specifically, this means you don't have to bother to set repo_url and tmp_dir.

From the scikit-learn-speed/benchmarks folder, run:

python run_suite.py  # This runs the entire suite, ~10min on my machine
python generate_rst_files.py  # This prepares the rst documentation

To actually generate the HTML files, change to the scikit-learn-speed foldar and execute::
```
python make.py
```
You can view the results by opening scikit-learn-speed/benchmarks/build/html/index.html.

Datasets

The following datasets are available:

arcene: train: (100, 10000), test: (100, 10000)
madelon: train: (2000, 500), test: (600, 500)
minimadelon: train: (30, 500), test: (20, 500), 10 output
blobs: train: (300, 50) test: (200, 50), 10 tight centers
newsgroups: sparse, train: (11214, 130088), test: (7432, 130088)

In addition, you can append the following options to any dataset's name:

-oney: Only keeps the first output, i. e. y = y[:, 0]. Necessary for estimators that don't support multidimensional output arrays.
-semi: Unlabels samples at random, by setting the corresponding output to -1. Useful for semi-supervised algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
benchmarks		benchmarks
doc		doc
.gitignore		.gitignore
README.md		README.md
make.py		make.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

doc

doc

.gitignore

.gitignore

README.md

README.md

make.py

make.py

Repository files navigation

scikit-learn-speed

Usage

Datasets

About

Releases

Packages

Languages

ogrisel/scikit-learn-speed

Folders and files

Latest commit

History

Repository files navigation

scikit-learn-speed

Usage

Datasets

About

Resources

Stars

Watchers

Forks

Languages