Skip to content

techdragon/scrapy-rethinkdb

 
 

Repository files navigation

#scrapy-rethinkdb

Build Status

Pipeline to insert documents to a RethinkDB table.

The minimal configuration requires defining the pipeline in the item pipelines and a table name.

ITEM_PIPELINES = [
  'scrapy_rethinkdb.RethinkDBPipeline',
]

RETHINKDB_TABLE = 'items'

The connection will be created with the defaults of RethinkDB's python driver (see http://www.rethinkdb.com/api/python/connect/) unless is overwritten as in the following example to use the database called scrapydb:

RETHINKDB_CONNECTION = {
    'db': 'scrapydb'
}

The insertion will be done with the default options of RethinkDB's python driver (see http://www.rethinkdb.com/api/python/insert/) unless is overwritten as in the following example to insert or update the item:

RETHINKDB_INSERT_OPTIONS = {
    'upsert': False
}

The document inserted by default is the dictionary from the item. This can be changed by extending the pipeline and overriding the method get_document.

If any additional behaviour is needed before/after the insertion, such as changing the item, dropping it, etc, this can be done by extending the pipeline and overriding the method(s) before_insert/after_insert.

About

Scrapy pipeline for RethinkDB.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published