See blank-pupa to install dependencies and get started.
python -m pupa.cli update --nonstrict ca_ab_edmonton
To run only the scraping step and skip the import step into MongoDB add the --scrape
switch:
python -m pupa.cli update --nonstrict --scrape ca_ab_edmonton
For documentation on the pupa.cli
command:
python -m pupa.cli -h
For documentation on the update
subcommand:
python -m pupa.cli update -h
Find division identifiers using the Open Civic Data Division Identifier (OCD-ID) Viewer or by browsing the list of identifiers. In most cases, a municipality will have a division identifier with a type ID of csd
. Then, create a scraper with:
invoke new --division-id ocd-division/country:ca/csd:5915022
This command creates an __init__.py
file and a stub people.py
file within a new directory for the scraper. The __init_.py
file, which describes the jurisdiction, should not require any editing.
Most jurisdictions have a geographic_code
that corresponds to a Standard Geographical Classification (SGC) 2011 code. Other jurisdictions have a division_id
that corresponds to an OCD-ID.
Read the Pupa documentation or an existing scraper's code.
If the pupa.cli
command raises the error below, ensure that MongoDB is running.
TypeError: 'ErrorProxy' object is not subscriptable
The tidy.py
script will correct module names, class names, and jurisdiction_id
, division_name
, name
and url
in __init.py__
files. It will report any module without an OCD division or with a name
or url
that requires manual verification.
invoke tidy
To check that all sources are credited, run:
invoke sources
To test PEP 8 conformance, run:
pep8 .
To tidy all whitespace, run:
autopep8 -i -a -r --ignore=E111,E121,E501,W6 .
To print all jurisdiction URLs:
invoke urls
Periodically, update the metadata about OCD-IDs:
ruby constants.rb
Scraper code rarely undergoes code review. The focus is on the quality of the data.
This repository is on GitHub: http://github.com/opencivicdata/scrapers-ca, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.
Copyright (c) 2013 Open North Inc., released under the MIT license