AnnData Benchmarks

This repo contains some work in progress benchmarks for AnnData using asv.

Setup

I definitley recommend reading through the asv docs. Currently, this assumes the benchmark suite can reach the anndata repo via the path ../anndata. Otherwise, all you'll need to do is create a machine file for your system and make sure anndatas dependencies are installable via conda.

Data

Data will need to be retrieved for these benchmarks. Currently this is can be done via the script fetch_datasets.py. This downloads and caches some files for each dataset. The set of datasets retrieved can be limited via the --pattern argument.

Note that the h5ad format has changed since it's inception. While the anndata package maintains backwards compatability, older versions of anndata will not be able to read files written by more recent versions. To get around this for the benchmarks, datasets have to be able to be read by all versions which can require a setup function that creates the anndata object.

Usage

Runnings the benchmarks:

To run benchmarks for a particular commit: asv run {commit} --steps 1 -b

To run benchmarks for a range of commits: asv run {commit1}..{commit2}

You can filter out the benchmarks which are run with the -b {patttern} flag.

Accessing the benchmarks

You can see what benchmarks you've alread run using asv show. If you don't specify a commit, it will search for the available commits. If you specify a commit it'll show you those results. For example:

$ asv show -b "views"
Commits with results:

Machine    : mimir.mobility.unimelb.net.au
Environment: conda-py3.7-h5py-memory_profiler-natsort-numpy-pandas-scipy

    61eb5bb7
    e9ccfc33
    22f12994
    0ebe187e

$ asv show -b "views" 0ebe187e
Commit: 0ebe187e <views-of-views>

views.SubsetMemorySuite.track_repeated_subset_memratio [mimir.mobility.unimelb.net.au/conda-py3.7-h5py-memory_profiler-natsort-numpy-pandas-scipy]
  ok
  ======= ======= ========== ============ ===================== ====================== ======================
  --                                                                   index_kind                            
  --------------------------------------- -------------------------------------------------------------------
   n_obs   n_var   attr_set   subset_dim         intarray             boolarray                slice         
  ======= ======= ========== ============ ===================== ====================== ======================
    100     100     X-csr        obs               2.84           1.7916666666666667            0.5          
    100     100     X-csr        var        2.5357142857142856    1.8695652173913044     0.5652173913043478  
    100     100    X-dense       obs        3.1739130434782608    1.6538461538461537            0.6          
...

You can compare two commits with asv compare

$ asv compare e9ccfc 0ebe187e
All benchmarks:

       before           after         ratio
     [e9ccfc33]       [0ebe187e]
     <master>         <views-of-views>
-            2.16  1.7916666666666667     0.83  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'boolarray')
+ 2.533333333333333             2.84     1.12  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'intarray')
- 1.1923076923076923              0.5     0.42  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'slice')
  1.9615384615384615  1.8695652173913044     0.95  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'var', 'boolarray')

View in the browser:

You can view the benchmarks in the browser with asv publish followed by asv preview. If you want to include benchmarks of a local branch, I think you'll have to add that branch to the "branches" list in asv.conf.json.

TODO:

What's the right way to measure memory usage?
Choose datasets to use for benchmarks
Write script which downloads and prepares datasets to benchmark on
Add script to select which commits to run benchmarks at
More benchmarks

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
benchmarks		benchmarks
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
asv.conf.json		asv.conf.json
fetch_datasets.py		fetch_datasets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

.flake8

.flake8

.gitignore

.gitignore

README.md

README.md

asv.conf.json

asv.conf.json

fetch_datasets.py

fetch_datasets.py

Repository files navigation

AnnData Benchmarks

Setup

Data

Usage

Runnings the benchmarks:

Accessing the benchmarks

View in the browser:

TODO:

About

Releases

Packages

Languages

ivirshup/anndata-benchmarks

Folders and files

Latest commit

History

Repository files navigation

AnnData Benchmarks

Setup

Data

Usage

Runnings the benchmarks:

Accessing the benchmarks

View in the browser:

TODO:

About

Resources

Stars

Watchers

Forks

Languages