Skip to content

Scrape financial data of cities, EPCI (group of cities), departments and regions

License

Notifications You must be signed in to change notification settings

ffepora/nosfinanceslocales_scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NosFinancesLocales scraper

This project aims at scraping financial data of cities (="communes"), EPCI (group of cities Cf. wikipedia), department and regions from the website http://www.collectivites-locales.gouv.fr/.

We used scrapy lib to crawl the page and xpaths stuff to scrap data.

To check the quality of the crawling and to analyze data, we use ipython notebooks:

All the data scraped for the regions is committed as an example here:

Usage

To scrap data of a give zone type (city, epci, department or region) on a given fiscal year YYYY, run in the root dir:

scrapy crawl localfinance -o scraped_data_dir/zonetype_YYYY.json -t csv -a year=YYYY -a zone_type=zonetype

To scrap data for all available fiscal years for a given zone type:

source bin/crawl_all_years.sh zonetype

To generate a csv file with all data for a given zonetype and with french header, run:

source bin/bundle_data.sh zonetype

This command will generate a file in nosdonnees/zonetype_all.csv which you can upload on nosdonnees.fr website.

Requirements

See requirements.txt file.

Tests

Run all

unit2 discover

Run one test

python test/test_commune_parsing.py Commune2009ParsingTestCase

Download an html file to add a new test

Here is an example to download a html page for a city at year 2013 : curl -X POST -d "ICOM=234&DEP=045&TYPE=BPS&PARAM=0&EXERCICE=2013" http://alize2.finances.gouv.fr/communes/eneuro/detail.php > test/data/commune_2013_account.html

TODO

  • Add some docs, especially indicate the mapping between variable names and fields in html pages.

About

Scrape financial data of cities, EPCI (group of cities), departments and regions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 84.6%
  • Python 15.1%
  • Shell 0.3%