GeoZones

Simplistic spatial/administrative referential.

This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.

The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparables) nor precision (most sourced datasets have a 100m precision).

These tools work on and exports WGS84 spatial data.

Requirements

This project use MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:

click
PyMongo
Fiona
Shapely

The web interface requires Flask

Translations requires Babel and Transifex client

Getting started

There is many way of getting a developement environement started.

Assuming you have Virtualenv and MongoDB installed and configured on you computer:

$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -r requirements.pip
$ ./geozones.py

Model

There is two main models:

level hierarchies
zone/territories

GeoZones use MongoDB as working storage.

Levels

They define relationship between levels and their names. They are not stored into database.

Zones

A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessary unique (ie. postal codes can be shared by many town)

Labels are optionnaly translatables.

Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and builded after all data are loaded.

The following properties are exported in the GeoJSON output:

id: A unique identifier defined by <level>/<code>
code: The zone unique identifier in this level
level: The level identifier
name: The zone display name (can be translatable)
population: Estimated/approximative population (optional)
area: Estimated/approximative area in km2 (optional)
wikipedia: A Wikipedia reference (optional)
dbpedia: A DBPedia reference (optional)
flag: A DBPedia reference to a flag (optional)
blazon: A DBPedia reference to a blazon (optional)
keys: A dictionnary of known keys/code for this zone
parents: A list of every known parent zone identifier

Translations

Level names and some territories are translatables. They are providen as gettext files. Translations are handled on transifex.

Here's the workflow:

# Extract translatables labels
$ pybabel extract -F babel.cfg -o translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$ pybabel compile -D geozones -d translations

To add an extra language:

$ pybabel init -D geozones -i translations/geozones.pot -d translations -l <language code>
$ tx push -t -l <language code>

Commands

A set of commands are providen for the build process. You can list all of them with:

$ ./geozones.py --help

`download`

Download the required datasets. Datasets will be stored into a downloads subdirectory.

`load`

Load and process datasets into database.

`aggregate`

Perform zones aggregations for zones defined as aggregation of others.

`postprocess`

Perform some non geospatial processing (ex: set the postal codes, attach the parents...).

`dist`

Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.

`full`

All in one task equivalent to:

# Perform all tasks from download to distibution
$ ./geozones.py download load aggregate postprocess dist

`explore`

Serve a webinterface to explore the generated data.

`status`

Display some useful informations and statistics

Commands are chainables so you can write:

# Perform all tasks from download to distibution
$ ./geozones.py download load -d aggregate postprocess dist dist -s status

Options

`serialization`

You can export data in (Geo)JSON or msgpack formats.

The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.

Reused datasets

Possible improvements

Build

Incremental downloads, maybe with checksum check
Global postprocessor
Postprocessor dependencies
Audit trail
Distribute GeoZone as a standalone python executable
Some quality check tools

Fields

Global weight = f(population, area, level)

Output

Different precision output
Localized JSON outputs (Output are english only right now)
Translations as distributable JSON (as an alternative to the current PO/MO format)
Translations as Python package
Model versionning
Statistics/coverages in levels

Web interface

Querying
Only fetch zones for viewport (less intensive for lower layers)
A full web-service as a separate project

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.tx		.tx
static		static
templates		templates
translations		translations
.gitignore		.gitignore
README.rst		README.rst
babel.cfg		babel.cfg
dbpedia.py		dbpedia.py
explore.py		explore.py
france.py		france.py
geo.py		geo.py
geojson.py		geojson.py
geozones.py		geozones.py
international.py		international.py
requirements.pip		requirements.pip
tools.py		tools.py

teleboas/geozones

Folders and files

Latest commit

History

Repository files navigation

GeoZones

Requirements

Getting started

Model

Levels

Zones

Translations

Commands

download

load

aggregate

postprocess

dist

full

explore

status

Options

serialization