GitHub

IOWeb Framework

Python framework to build web crawlers.

Good things:

system designed to run large number of network threads (like 100 or 500) on single CPU core
feature to combine things in chunks and then doing something with chunks (like mongodb bulk write)
asynchronous network operations are powered by gevent
network requests are handled with urllib3
HTML is parsed with lxml
ability to do CSS/XPATh queries to DOM tree of downloaded HTML document
ability to extract cert details
ability to resolve particular domain to custom IP
stat module to count events
logging statistics to influxdb
retrying on network errors

Bad things:

not fully covered with tests
no documentation

Feedback

t.me/grablab - English chat about web scraping
t.me/grablab_ru - Russian chat about web scraping

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github/workflows		.github/workflows
ioweb		ioweb
ioweb_gevent		ioweb_gevent
test		test
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.hgignore		.hgignore
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
conftest.py		conftest.py
pytest.ini		pytest.ini
requirements_dev.txt		requirements_dev.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

ioweb

ioweb

ioweb_gevent

ioweb_gevent

test

test

.bumpversion.cfg

.bumpversion.cfg

.gitignore

.gitignore

.hgignore

.hgignore

MANIFEST.in

MANIFEST.in

Makefile

Makefile

README.md

README.md

conftest.py

conftest.py

pytest.ini

pytest.ini

requirements_dev.txt

requirements_dev.txt

setup.py

setup.py

tox.ini

tox.ini

Repository files navigation

IOWeb Framework

Feedback

About

Releases

Packages

Languages

sihai90/ioweb

Folders and files

Latest commit

History

Repository files navigation

IOWeb Framework

Feedback

About

Resources

Stars

Watchers

Forks

Languages