Skip to content

kingking888/scrapydd

 
 

Repository files navigation

ScrapyDD (Scrapy Distributed Daemon)

PyPI Version

Build Status

Coverage report

Scrapydd is a system for scrapy spiders distributed running and scheduleing system, including server and client agent.

Advantages:

  • Distributed, easily add runner(agent) to scale out.
  • Project requirements auto install on demand.
  • Cron expression time driven trigger, run your spider on time.
  • Webhook loosely couple the data crawling and data processing.
  • Spider status insight, system will look into the log to clarify spider run status.

Installing Scrapydd

By pip:

pip install scrapydd

You can also install scrapydd manually:

  1. Download compressed package from github releases.
  2. Decompress the package
  3. Run python setup.py install

Run Scrapydd Server

scrapydd server

The server default serve on 0.0.0.0:6800, with both api and web ui. Add --daemon parameter in commmand line to run in background.

Run Scrapydd Agent

scrapydd agent

Add --daemon parameter in commmand line to run in background.

Docs

The docs is hosted here

Docker-Compose

version: '3'
services:
  scrapydd-server:
    image: "kevenli/scrapydd"
    ports:
      - "6800:6800"
    volumes:
      - "/scrapydd/server:/scrapydd"
      - "/var/run/docker.sock:/var/run/docker.sock"
    command: scrapydd server

  scrapydd-agent:
    image: "kevenli/scrapydd"
    volumes:
      - "/scrapydd/server:/scrapydd"
      - "/var/run/docker.sock:/var/run/docker.sock"
    links:
      - scrapydd-server
    environment:
      - SCRAPYDD_SERVER=scrapydd-server
    command: scrapydd agent

About

Scrapydd is a system for scrapy spiders distributed running and scheduleing system, including server and client agent.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 48.7%
  • CSS 25.6%
  • JavaScript 22.3%
  • HTML 3.3%
  • Other 0.1%