Skip to content

Bornholm/localgouv_scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Localgouv

This project aims at scraping financial data of cities (="communes"), EPCI (group of cities Cf. wikipedia), department and regions from the website http://www.collectivites-locales.gouv.fr/.

We used scrapy lib to crawl the page and xpaths stuff to scrap data.

To check the quality of the crawling and to analyze data, we use ipython notebooks:

All the data scraped for the regions is committed as an example here:

Usage

To scrap data of a give zone type (city, epci, department or region) on a given fiscal year YYYY, run in the root dir:

scrapy crawl localgouv -o scraped_data_dir/zonetype_YYYY.json -t csv -a year=YYYY

To scrap data for all available fiscal years for a given zone type:

. bin/crawl_all_years.sh zonetype

To generate a csv file with all data for a given zonetype and with french header, run:

. bin/bundle.sh zonetype

This command will generate a file in nosdonnees/zonetype_all.csv which you can upload on nosdonnees.fr website.

Requirements

See requirements.txt file.

Tests

unit2 discover

TODO

  • Add some docs, especially indicate the mapping between variable names and fields in html pages.
  • Get simple stats on scraped data to check its quality (partly made for cities and epci).
  • Add some tests on different fiscal years.

About

Scrape financial data of cities, EPCI (group of cities), departments and regions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.5%
  • Shell 1.5%