scrapyrwiki

A collection of helpers for running scrapers built with Scrapy in ScraperWiki

Launch scraper without scrapy CLI

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def main():
    run_spider(MySpider(), settings)

if __name__ == '__main__':
    main()

Save produced data to ScraperWiki

Just add "scrapyrwiki.pipelines.ScraperWikiPipeline" to ITEM_PIPELINES

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def scraperwiki():
    options = {
        'SW_SAVE_BUFFER': 5,
        'SW_UNIQUE_KEYS': {"MyItem": ['url']},
        'ITEM_PIPELINES': ['scrapyrwiki.pipelines.ScraperWikiPipeline'],
    }
    settings.overrides.update(options)
    run_spider(MySpider(), settings)


if __name__ == 'scraper':
    scraperwiki()

Check spider contracts in CI

Just launch spider with run_tests

Example:

from scrapyrwiki import run_tests
from scrapy.conf import settings

run_tests(MySpider(), "output.xml", settings)

Note: For testing the HTTP cache is used. In the directory where the script is launched there must be a scrapy.cfg (needed by Scrapy to identify that's a scraper directory) and a .scrapy directory with the HTTP cache db.

The output is in XUnit format, tested on Jenkins

Log scraper errors to Sentry

Install scrapy-sentry and set the environment variable SENTRY_DSN with the Sentry key. Scrapyrwiki will handle everything for you.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
scrapyrwiki		scrapyrwiki
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapyrwiki

scrapyrwiki

.gitignore

.gitignore

MANIFEST.in

MANIFEST.in

README.rst

README.rst

setup.py

setup.py

Repository files navigation

scrapyrwiki

Launch scraper without scrapy CLI

Save produced data to ScraperWiki

Check spider contracts in CI

Log scraper errors to Sentry

About

Releases

Packages

Contributors 4

Languages

SpazioDati/scrapyrwiki

Folders and files

Latest commit

History

Repository files navigation

scrapyrwiki

Launch scraper without scrapy CLI

Save produced data to ScraperWiki

Check spider contracts in CI

Log scraper errors to Sentry

About

Resources

Stars

Watchers

Forks

Languages