Skip to content

flyingnn/pyspider

 
 

Repository files navigation

pyspider Build Status Coverage Status

A spider system in python. Try It Now!

  • Write script with python
  • Web script editor, debugger, task monitor, project manager and result viewer
  • Distributed architecture
  • MySQL, MongoDB and SQLite as database backend
  • Full control of crawl process with powerful API
  • Javascript pages Support! (with phantomjs fetcher)

debug demo demo code: gist:9424801

Installation

Docker

# mysql
docker run -it -d --name mysql dockerfile/mysql
# rabbitmq
docker run -it -d --name rabbitmq dockerfile/rabbitmq
# phantomjs link to fetcher and webui
docker run --name phantomjs -it -d -v `pwd`:/mnt/test --expose 25555 cmfatih/phantomjs /usr/bin/phantomjs /mnt/test/fetcher/phantomjs_fetcher.js 25555

# scheduler
docker run -it -d --name scheduler --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider scheduler
# fetcher, run multiple instance if needed.
docker run -it -d -m 64m --link rabbitmq:rabbitmq binux/pyspider fetcher
# processor, run multiple instance if needed.
docker run -it -d -m 128m --link mysql:mysql --link rabbitmq:rabbitmq binux/pyspider processor
# webui
docker run -it -d -p 5000:5000 --link mysql:mysql --link rabbitmq:rabbitmq --link scheduler:scheduler binux/pyspider webui

Documents

Contribute

License

Licensed under the Apache License, Version 2.0

Packages

No packages published

Languages

  • Python 80.9%
  • JavaScript 11.5%
  • CSS 7.4%
  • Shell 0.2%