Skip to content

david-caro/inspire-crawler

 
 

Repository files navigation

inspire-crawler

image

image

image

image

image

Crawler integration with INSPIRE-HEP using scrapy project HEPCrawl.

This module allows scheduling of crawler jobs to a Scrapyd instance serving a Scrapy project. E.g. in this case the default scrapy project is HEPCrawl.

It integrates directly with invenio-workflows module to create workflows for every record harvested by the crawler.

This module is meant to use only with INSPIRE-HEP overlay. Use at own risk.

Full documentation is hosted here: http://pythonhosted.org/inspire-crawler/

See also documentation of HEPCrawl: http://pythonhosted.org/hepcrawl/

About

Crawler integration with INSPIRE-HEP.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.8%
  • Shell 3.2%