Pomp is a screen scraping and web crawling framework. Like Scrapy, but more simple.
Inspired by Scrapy but simpler implementation and without hard Twisted dependency.
Features:
- pure python
- one dependency only for python2.x - concurrent.futures (backport package for python2.x)
- one file applications, without project layouts and others restrictions
- meta framework like Paste (a framework for scrapping frameworks)
- extendible networking, may be used any sync or async methods
- without parsing libraries in the core, use you favorites
- can be distributed, designed to use an external queue
Do not care about:
- redirects
- proxies
- caching
- database integration
- cookies
- authentication
- etc.
If you want some proxies, redirects or others stuff implement it by our self or use great library - requests as Pomp downloader.
Continuous integration status by drone.io:
PyPI status:
Docs status:
Pomp is written and maintained by Evgeniy Tatarkin and is licensed under BSD license.