urltools

Some functions to parse and normalize URLs.

Functions

Normalize

>>> urltools.normalize("Http://exAMPLE.com./foo")
http://example.com/foo

Parse

>>> urltools.parse("http://example.co.uk/foo/bar?x=1#abc")
ParseResult(scheme='http', subdomain='www', domain='example', tld='co.uk', port='', path='/foo/bar', query='x=1', fragment='abc')
>>> urltools.parse("www.example.co.uk/abc")
ParseResult(scheme='', subdomain='', domain='', tld='', port='', path='www.example.co.uk/abc', query='', fragment='')

Extract

The difference between extract and parse is that parse cares about relative URLs and extract always tries to extract as much information as possible.

>>> urltools.extract("www.example.co.uk/abc")
ParseResult(scheme='', subdomain='www', domain='example', tld='co.uk', port='', path='/abc', query='', fragment='')

Installation

You can install urltools from the Python Package Index (PyPI):

pip install urltools

... or get the newest version directly from GitHub:

pip install -e git://github.com/rbaier/urltools.git#egg=urltools

Public Suffix List

urltools uses the Public Suffix List to split domain names correctly. E.g. the TLD of example.co.uk would be .co.uk and not .uk.

I recommend to use a local copy of this list. Otherwise it will be downloaded after each import of urltools.

export PUBLIC_SUFFIX_LIST="/path/to/effective_tld_names.dat"

For more information see http://publicsuffix.org/

Tests

To run the tests I use pytest:

py.test -vrxs

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
urltools		urltools
.gitignore		.gitignore
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

urltools

urltools

.gitignore

.gitignore

MANIFEST.in

MANIFEST.in

README.md

README.md

setup.py

setup.py

Repository files navigation

urltools

Functions

Normalize

Parse

Extract

Installation

Public Suffix List

Tests

About

Releases

Packages

erazor85/urltools

Folders and files

Latest commit

History

Repository files navigation

urltools

Functions

Normalize

Parse

Extract

Installation

Public Suffix List

Tests

About

Resources

Stars

Watchers

Forks