Skip to content

sihai90/ioweb

 
 

Repository files navigation

IOWeb Framework

pytest status pytype status

Python framework to build web crawlers.

Good things:

  • system designed to run large number of network threads (like 100 or 500) on single CPU core
  • feature to combine things in chunks and then doing something with chunks (like mongodb bulk write)
  • asynchronous network operations are powered by gevent
  • network requests are handled with urllib3
  • HTML is parsed with lxml
  • ability to do CSS/XPATh queries to DOM tree of downloaded HTML document
  • ability to extract cert details
  • ability to resolve particular domain to custom IP
  • stat module to count events
  • logging statistics to influxdb
  • retrying on network errors

Bad things:

  • not fully covered with tests
  • no documentation

Feedback

About

Web Scraping Framework

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.6%
  • Makefile 0.4%