Skip to content

ivirshup/anndata-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AnnData Benchmarks

This repo contains some work in progress benchmarks for AnnData using asv.

Setup

I definitley recommend reading through the asv docs. Currently, this assumes the benchmark suite can reach the anndata repo via the path ../anndata. Otherwise, all you'll need to do is create a machine file for your system and make sure anndatas dependencies are installable via conda.

Data

Data will need to be retrieved for these benchmarks. Currently this is can be done via the script fetch_datasets.py. This downloads and caches some files for each dataset. The set of datasets retrieved can be limited via the --pattern argument.

Note that the h5ad format has changed since it's inception. While the anndata package maintains backwards compatability, older versions of anndata will not be able to read files written by more recent versions. To get around this for the benchmarks, datasets have to be able to be read by all versions which can require a setup function that creates the anndata object.

Usage

Runnings the benchmarks:

To run benchmarks for a particular commit: asv run {commit} --steps 1 -b

To run benchmarks for a range of commits: asv run {commit1}..{commit2}

You can filter out the benchmarks which are run with the -b {patttern} flag.

Accessing the benchmarks

You can see what benchmarks you've alread run using asv show. If you don't specify a commit, it will search for the available commits. If you specify a commit it'll show you those results. For example:

$ asv show -b "views"
Commits with results:

Machine    : mimir.mobility.unimelb.net.au
Environment: conda-py3.7-h5py-memory_profiler-natsort-numpy-pandas-scipy

    61eb5bb7
    e9ccfc33
    22f12994
    0ebe187e
$ asv show -b "views" 0ebe187e
Commit: 0ebe187e <views-of-views>

views.SubsetMemorySuite.track_repeated_subset_memratio [mimir.mobility.unimelb.net.au/conda-py3.7-h5py-memory_profiler-natsort-numpy-pandas-scipy]
  ok
  ======= ======= ========== ============ ===================== ====================== ======================
  --                                                                   index_kind                            
  --------------------------------------- -------------------------------------------------------------------
   n_obs   n_var   attr_set   subset_dim         intarray             boolarray                slice         
  ======= ======= ========== ============ ===================== ====================== ======================
    100     100     X-csr        obs               2.84           1.7916666666666667            0.5          
    100     100     X-csr        var        2.5357142857142856    1.8695652173913044     0.5652173913043478  
    100     100    X-dense       obs        3.1739130434782608    1.6538461538461537            0.6          
...

You can compare two commits with asv compare

$ asv compare e9ccfc 0ebe187e
All benchmarks:

       before           after         ratio
     [e9ccfc33]       [0ebe187e]
     <master>         <views-of-views>
-            2.16  1.7916666666666667     0.83  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'boolarray')
+ 2.533333333333333             2.84     1.12  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'intarray')
- 1.1923076923076923              0.5     0.42  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'obs', 'slice')
  1.9615384615384615  1.8695652173913044     0.95  views.SubsetMemorySuite.track_repeated_subset_memratio(100, 100, 'X-csr', 'var', 'boolarray')

View in the browser:

You can view the benchmarks in the browser with asv publish followed by asv preview. If you want to include benchmarks of a local branch, I think you'll have to add that branch to the "branches" list in asv.conf.json.

TODO:

  • What's the right way to measure memory usage?
  • Choose datasets to use for benchmarks
  • Write script which downloads and prepares datasets to benchmark on
  • Add script to select which commits to run benchmarks at
  • More benchmarks

About

Benchmarks of AnnData using asv

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages