Skip to content

fchagnon/scrapers-ca

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Canadian Legislative Scrapers

See blank-pupa to install dependencies and get started.

Run a scraper

python -m pupa.cli update --nonstrict ca_ab_edmonton

To run only the scraping step and skip the import step into MongoDB add the --scrape switch:

python -m pupa.cli update --nonstrict --scrape ca_ab_edmonton

For documentation on the pupa.cli command:

python -m pupa.cli -h

For documentation on the update subcommand:

python -m pupa.cli update -h

Create a scraper

Find division identifiers using the Open Civic Data Division Identifier (OCD-ID) Viewer or by browsing the list of identifiers. In most cases, a municipality will have a division identifier with a type ID of csd. Then, create a scraper with:

invoke new --division-id ocd-division/country:ca/csd:5915022

This command creates an __init__.py file and a stub people.py file within a new directory for the scraper. The __init_.py file, which describes the jurisdiction, should not require any editing.

Most jurisdictions have a geographic_code that corresponds to a Standard Geographical Classification (SGC) 2011 code. Other jurisdictions have a division_id that corresponds to an OCD-ID.

Develop a scraper

Read the Pupa documentation or an existing scraper's code.

Troubleshooting

If the pupa.cli command raises the error below, ensure that MongoDB is running.

TypeError: 'ErrorProxy' object is not subscriptable

Maintenance

The tidy.py script will correct module names, class names, and jurisdiction_id, division_name, name and url in __init.py__ files. It will report any module without an OCD division or with a name or url that requires manual verification.

invoke tidy

To check that all sources are credited, run:

invoke sources

To test PEP 8 conformance, run:

pep8 .

To tidy all whitespace, run:

autopep8 -i -a -r --ignore=E111,E121,E501,W6 .

To print all jurisdiction URLs:

invoke urls

Periodically, update the metadata about OCD-IDs:

ruby constants.rb

Scraper code rarely undergoes code review. The focus is on the quality of the data.

Bugs? Questions?

This repository is on GitHub: http://github.com/opencivicdata/scrapers-ca, where your contributions, forks, bug reports, feature requests, and feedback are greatly welcomed.

Copyright (c) 2013 Open North Inc., released under the MIT license

About

Canadian legislative scrapers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published