w3lib

Overview

This is a Python library of web-related functions, such as:

remove comments, or tags from HTML snippets
extract base url from HTML snippets
translate entites on HTML strings
convert raw HTTP headers to dicts and vice-versa
construct HTTP auth header
converting HTML pages to unicode
RFC-compliant url joining
sanitize urls (like browsers do)
extract arguments from urls

Modules

The w3lib package consists of four modules:

w3lib.url - functions for working with URLs
w3lib.html - functions for working with HTML
w3lib.http - functions for working with HTTP
w3lib.encoding - functions for working with character encoding

Requirements

Python 2.6 or 2.7

Install

pip install w3lib

Release notes

See the NEWS file.

Documentation

For more information, see the code and tests. The functions are all documented with docstrings.

Tests

nose is the preferred way to run tests. Just run: nosetests from the root directory to execute tests using the default Python interpreter.

tox could be used to run tests for all supported Python versions. Install it (using 'pip install tox') and then run tox from the root directory - tests will be executed for all available Python interpreters.

License

The w3lib library is licensed under the BSD license.

History

The code of w3lib was originally part of the Scrapy framework but was later stripped out of Scrapy, with the aim of make it more reusable and to provide a useful library of web functions without depending on Scrapy.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
docs		docs
w3lib		w3lib
.gitignore		.gitignore
.travis-workarounds.sh		.travis-workarounds.sh
.travis.yml		.travis.yml
LICENSE		LICENSE
NEWS		NEWS
README.rst		README.rst
setup.py		setup.py
stdeb.cfg		stdeb.cfg
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

w3lib

w3lib

.gitignore

.gitignore

.travis-workarounds.sh

.travis-workarounds.sh

.travis.yml

.travis.yml

LICENSE

LICENSE

NEWS

NEWS

README.rst

README.rst

setup.py

setup.py

stdeb.cfg

stdeb.cfg

tox.ini

tox.ini

Repository files navigation

w3lib

Overview

Modules

Requirements

Install

Release notes

Documentation

Tests

License

History

About

Releases

Packages

License

fubuki/w3lib

Folders and files

Latest commit

History

Repository files navigation

w3lib

Overview

Modules

Requirements

Install

Release notes

Documentation

Tests

License

History

About

Resources

License

Stars

Watchers

Forks