Skip to content

JeffersonK/us-address-parser

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

usaddress

usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods.

To install

> pip install usaddress

To build and test development code.

> pip install -r requirements.txt
> python setup.py develop
> python training/training.py
> nosetests .

Here's how you use it:

>>> import usaddress
>>> usaddress.parse('123 Main St. Suite 100 Chicago, IL')
[('123', 'AddressNumber'), 
 ('Main', 'StreetName'), 
 ('St.', 'StreetNamePostType'), 
 ('Suite', 'OccupancyType'), 
 ('100', 'OccupancyIdentifier'), 
 ('Chicago,', 'PlaceName'), 
 ('IL', 'StateName')]

Notes

What this can do: Using a probabilistic model, it makes (very educated) guesses in identifying address components, even in tricky cases where rule-based parsers typically break down.

What this cannot do: It cannot identify address components with perfect accuracy, nor can it verify that a given address is correct/valid.

Important links

About

US address parsing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published