Skip to content
This repository has been archived by the owner on Nov 22, 2017. It is now read-only.

The OAI Harvest module handles metadata gathering between OAI-PMH v.2.0 compliant repositories.

License

Notifications You must be signed in to change notification settings

inspirehep/invenio-oaiharvester

 
 

Repository files navigation

Invenio-OAIHarvester

Invenio module for OAI-PMH metadata harvesting between repositories.

This is an experimental development preview release.

Features

This module allows you to easily harvest OAI-PMH repositories, thanks to the Sickle module, and via signals you can hook the output into your application, or simply to files.

You keep configurations of your OAI-PMH sources via SQLAlchemy models and run or schedule immediate harvesting jobs via command-line or regularly via Celery beat.

Harvesting is simple

inveniomanage oaiharvester get -u http://export.arxiv.org/oai2 -i oai:arXiv.org:1507.07286 > my_record.xml

This will harvest the repository for a specific record and print the records to stdout - which in this case will save it to a file called my_record.xml.

If you want to have your harvested records saved in a directory automatically, its easy:

inveniomanage oaiharvester get -u http://export.arxiv.org/oai2 -i oai:arXiv.org:1507.07286 -d /tmp

Note the directory -d parameter that specifies a directory to save harvested XML files.

Integration with your application

If you want to integrate invenio-oaiharvester into your application, you should hook into the signals sent by the harvester upon completed harvesting.

See invenio_oaiharvester.signals:oaiharvest_finished.

Check also the defined Celery tasks under invenio_oaiharvester.tasks.

Managing OAI-PMH sources

If you want to store configuration for an OAI repository, you can use the SQLAlchemy model invenio_oaiharvester.models:OaiHARVEST.

This is useful if you regularly need to query a server.

Here you can add information about the server URL, metadataPrefix to use etc. This information is also available when scheduling and running tasks:

inveniomanage oaiharvester get -n somerepo -i oai:example.org:1234

Here we are using the -n, --name parameter to specify which configured OAI-PMH source to query, using the name property.

API

If you need to schedule or run harvests via Python, you can use our API:

from invenio_oaiharvester.api import get_records

request, records = get_records(identifiers=["oai:arXiv.org:1207.7214"],
                               url="http://export.arxiv.org/oai2")
for record in records:
    print rec.raw

About

The OAI Harvest module handles metadata gathering between OAI-PMH v.2.0 compliant repositories.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%