Skip to content

jsheunis/datalad-metalad

 
 

Repository files navigation

[Unstable] DataLad extension for semantic metadata handling

Build status codecov.io GitHub release PyPI version fury.io Documentation

NB! This software is currently under heavy re-writing. This includes the master branch, which is currently considered unstable.

This software is a DataLad extension that equips DataLad with an alternative command suite for metadata handling (extraction, aggregation, reporting).

It is backward-compatible with the metadata storage format in DataLad proper, while being substantially more performant (especially on large dataset hierarchies). Additionally, it provides new metadata extractors and improved variants of DataLad's own ones that are tuned for better performance and richer, JSON-LD compliant metadata reports.

Command(s) currently provided by this extension

  • meta-extract -- new and improved dedicated command to run any and all of DataLad's metadata extractors.
  • meta-aggregate -- complete reimplementation of metadata aggregation, with stellar performance benefits, in particular on large dataset hierarchies.
  • meta-dump -- new command to specifically access the aggregated metadata present in a dataset, much faster and more predictable behavior than the metadata command in datalad-core.

Additional metadata extractor implementations

  • metalad_core -- enriched variant of the datalad_core extractor that yields valid JSON-LD
  • metalad_annex -- refurbished variant of the annex extractor using the metalad extractor API
  • metalad_custom -- read pre-crafted metadata from shadow/side-care files for a dataset and/or any file in a dataset.
  • metalad_runprov -- report provenance metadata for datalad run records following the W3C PROV model

Installation

Before you install this package, please make sure that you install a recent version of git-annex. Afterwards, install the latest version of datalad-metalad from PyPi. It is recommended to use a dedicated virtualenv:

# create and enter a new virtual environment (optional)
virtualenv --system-site-packages --python=python3 ~/env/datalad
. ~/env/datalad/bin/activate

# install from PyPi
pip install datalad_metalad

Support

For general information on how to use or contribute to DataLad (and this extension), please see the DataLad website or the main GitHub project page. The documentation is found here: http://docs.datalad.org/projects/metalad

All bugs, concerns and enhancement requests for this software can be submitted here: https://github.com/datalad/datalad-metalad/issues

If you have a problem or would like to ask a question about how to use DataLad, please submit a question to NeuroStars.org with a datalad tag. NeuroStars.org is a platform similar to StackOverflow but dedicated to neuroinformatics.

All previous DataLad questions are available here: http://neurostars.org/tags/datalad/

Acknowledgements

This DataLad extension was developed with support from the German Federal Ministry of Education and Research (BMBF 01GQ1905), and the US National Science Foundation (NSF 1912266).

About

Next generation metadata handling

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.7%
  • Other 0.3%