Skip to content

TAlonglong/hdf5plugin

 
 

Repository files navigation

hdf5plugin

This module provides HDF5 compression filters (namely: blosc, bitshuffle and lz4) and registers them to the HDF5 library used by h5py.

  • Supported operating systems: Linux, Windows, macOS.
  • Supported versions of Python: 2.7 and >= 3.4

hdf5plugin provides a generic way to enable the use of the provided HDF5 compression filters with h5py. HDF5 compression filters can be also be installed either system-wide on Linux or through Anaconda (blosc-hdf5-plugin, hdf5-lz4)

The HDF5 plugin sources were obtained from:

Installation

To install, just run:

pip install hdf5plugin

To install locally, run:

pip install hdf5plugin --user

Documentation

To use it, just use import hdf5plugin and supported compression filters are available from h5py.

Sample code:

import numpy
import h5py
import hdf5plugin

# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), compression=hdf5plugin.LZ4_ID)
f.close()

# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()

hdf5plugin provides:

Bitshuffle(nelems=0, lz4=True)

This class takes the following arguments and returns the compression options to feed into h5py.Group.create_dataset for using the bitshuffle filter:

  • nelems the number of elements per block, needs to be divisible by eight (default is 0, about 8kB per block)
  • lz4 if True the elements get compressed using lz4 (default is True)

It can be passed as keyword arguments.

Sample code:

f = h5py.File('test.h5', 'w')
f.create_dataset('bitshuffle_with_lz4', data=numpy.arange(100),
  **hdf5plugin.Bitshuffle(nelems=0, lz4=True))
f.close()

Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE)

This class takes the following arguments and returns the compression options to feed into h5py.Group.create_dataset for using the blosc filter:

  • cname the compression algorithm, one of:
    • 'blosclz'
    • 'lz4' (default)
    • 'lz4hc'
    • 'zlib'
    • 'zstd'
  • clevel the compression level, from 0 to 9 (default is 5)
  • shuffle the shuffling mode, in:
    • Blosc.NOSHUFFLE (0): No shuffle
    • Blosc.SHUFFLE (1): byte-wise shuffle (default)
    • Blosc.BITSHUFFLE (2): bit-wise shuffle

It can be passed as keyword arguments.

Sample code:

f = h5py.File('test.h5', 'w')
f.create_dataset('blosc_byte_shuffle_blosclz', data=numpy.arange(100),
    **hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE))
f.close()

LZ4(nbytes=0)

This class takes the number of bytes per block as argument and returns the compression options to feed into h5py.Group.create_dataset for using the lz4 filter:

  • nbytes number of bytes per block needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). The default value is 0 (for 1GB).

It can be passed as keyword arguments.

Sample code:

f = h5py.File('test.h5', 'w')
f.create_dataset('lz4', data=numpy.arange(100),
    **hdf5plugin.LZ4(nbytes=0))
f.close()

Dependencies

Testing

To run self-contained tests, from Python:

import hdf5plugin.test
hdf5plugin.test.run_tests()

Or, from the command line:

python -m hdf5plugin.test

To also run tests relying on actual HDF5 files, run from the source directory:

python test/test.py

This tests the installed version of hdf5plugin.

License

The source code of hdf5plugin itself is licensed under the MIT license. Use it at your own risk. See LICENSE

The source code of the embedded HDF5 filter plugin libraries is licensed under different open-source licenses. Please read the different licenses:

The HDF5 v1.10.5 headers (and Windows .lib file) used to build the filters are stored for convenience in the repository. The license is available here: src/hdf5/COPYING.

About

Set of compression filters for h5py

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 93.3%
  • Python 2.5%
  • C++ 2.5%
  • CMake 0.9%
  • Objective-C 0.3%
  • Makefile 0.3%
  • Other 0.2%