Simplistic spatial/administrative referential.
This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.
The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparables) nor precision (most sourced datasets have a 100m precision).
These tools work on and exports WGS84 spatial data.
This project use MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:
- click
- PyMongo
- Fiona
- Shapely
The web interface requires Flask
Translations requires Babel and Transifex client
There is many way of getting a developement environement started.
Assuming you have Virtualenv and MongoDB installed and configured on you computer:
$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -r requirements.pip
$ ./geozones.py
There is two main models:
- level hierarchies
- zone/territories
GeoZones use MongoDB as working storage.
They define relationship between levels and their names. They are not stored into database.
A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessary unique (ie. postal codes can be shared by many town)
Labels are optionnaly translatables.
Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and builded after all data are loaded.
The following properties are exported in the GeoJSON output:
- id
A unique identifier defined by
<level>/<code>
- code
The zone unique identifier in this level
- level
The level identifier
- name
The zone display name (can be translatable)
- population
Estimated/approximative population (optional)
- area
Estimated/approximative area in km2 (optional)
- wikipedia
A Wikipedia reference (optional)
- dbpedia
A DBPedia reference (optional)
- flag
A DBPedia reference to a flag (optional)
- blazon
A DBPedia reference to a blazon (optional)
- keys
A dictionnary of known keys/code for this zone
- parents
A list of every known parent zone identifier
Level names and some territories are translatables. They are providen as gettext files. Translations are handled on transifex.
Here's the workflow:
# Extract translatables labels
$ pybabel extract -F babel.cfg -o translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$ pybabel compile -D geozones -d translations
To add an extra language:
$ pybabel init -D geozones -i translations/geozones.pot -d translations -l <language code>
$ tx push -t -l <language code>
A set of commands are providen for the build process. You can list all of them with:
$ ./geozones.py --help
Download the required datasets. Datasets will be stored into a downloads
subdirectory.
Load and process datasets into database.
Perform zones aggregations for zones defined as aggregation of others.
Perform some non geospatial processing (ex: set the postal codes, attach the parents...).
Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.
All in one task equivalent to:
# Perform all tasks from download to distibution
$ ./geozones.py download load aggregate postprocess dist
Serve a webinterface to explore the generated data.
Display some useful informations and statistics
Commands are chainables so you can write:
# Perform all tasks from download to distibution
$ ./geozones.py download load -d aggregate postprocess dist dist -s status
You can export data in (Geo)JSON or msgpack formats.
The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.
- NaturalEarth administratives boundaries
- The Matic Mapping country boundaries
- OpenStreetMap french regions boundaries
- OpenStreetMap french counties boundaries
- OpenStreetMap french EPCIs boundaries
- OpenStreetMap french districts boundaries
- OpenStreetMap french towns boundaries
- OpenStreetMap french cantons boundaries
- IGN/ISEE IRIS agregated version
- French postal codes database
- DGCL EPCIs list
- INSEE COG
- Incremental downloads, maybe with checksum check
- Global postprocessor
- Postprocessor dependencies
- Audit trail
- Distribute GeoZone as a standalone python executable
- Some quality check tools
- Global weight = f(population, area, level)
- Different precision output
- Localized JSON outputs (Output are english only right now)
- Translations as distributable JSON (as an alternative to the current PO/MO format)
- Translations as Python package
- Model versionning
- Statistics/coverages in levels
- Querying
- Only fetch zones for viewport (less intensive for lower layers)
- A full web-service as a separate project