Skip to content

teleboas/geozones

 
 

Repository files navigation

GeoZones

Simplistic spatial/administrative referential.

This project is a set of tools to produce a shared spatial/administrative referential based on open datasets.

The purpose is to be embeddable in applications for autocompletion. There is no purpose of universality (country levels are not comparables) nor precision (most sourced datasets have a 100m precision).

These tools work on and exports WGS84 spatial data.

Requirements

This project use MongoDB 2.6+ and GDAL as main tooling. Build tools are written in Python 3 and make use of:

  • click
  • PyMongo
  • Fiona
  • Shapely

The web interface requires Flask

Translations requires Babel and Transifex client

Getting started

There is many way of getting a developement environement started.

Assuming you have Virtualenv and MongoDB installed and configured on you computer:

$ git clone https://github.com/etalab/geozones.git
$ cd geozones
$ virtualenv -p /bin/python3 .
$ source bin/activate
$ pip install -r requirements.pip
$ ./geozones.py

Model

There is two main models:

  • level hierarchies
  • zone/territories

GeoZones use MongoDB as working storage.

Levels

They define relationship between levels and their names. They are not stored into database.

Zones

A zone is a spatial polygon for a given level. It has at least one unique code (unique on its level) and a name. It can have many known keys, that are not necessary unique (ie. postal codes can be shared by many town)

Labels are optionnaly translatables.

Some zones are defined as an aggregation of other zones. They are called aggregation in geozones and builded after all data are loaded.

The following properties are exported in the GeoJSON output:

id

A unique identifier defined by <level>/<code>

code

The zone unique identifier in this level

level

The level identifier

name

The zone display name (can be translatable)

population

Estimated/approximative population (optional)

area

Estimated/approximative area in km2 (optional)

wikipedia

A Wikipedia reference (optional)

dbpedia

A DBPedia reference (optional)

flag

A DBPedia reference to a flag (optional)

blazon

A DBPedia reference to a blazon (optional)

keys

A dictionnary of known keys/code for this zone

parents

A list of every known parent zone identifier

Translations

Level names and some territories are translatables. They are providen as gettext files. Translations are handled on transifex.

Here's the workflow:

# Extract translatables labels
$ pybabel extract -F babel.cfg -o translations/geozones.pot .
# Push updated translations template to Transifex
$ tx push -s
# Fetch last translations from Transifex
$ tx pull
# Compile translations for packaging/distribution
$ pybabel compile -D geozones -d translations

To add an extra language:

$ pybabel init -D geozones -i translations/geozones.pot -d translations -l <language code>
$ tx push -t -l <language code>

Commands

A set of commands are providen for the build process. You can list all of them with:

$ ./geozones.py --help

download

Download the required datasets. Datasets will be stored into a downloads subdirectory.

load

Load and process datasets into database.

aggregate

Perform zones aggregations for zones defined as aggregation of others.

postprocess

Perform some non geospatial processing (ex: set the postal codes, attach the parents...).

dist

Dump the produced dataset as GeoJSON files for distribution. Files are dumped in a build subdirectory.

full

All in one task equivalent to:

# Perform all tasks from download to distibution
$ ./geozones.py download load aggregate postprocess dist

explore

Serve a webinterface to explore the generated data.

status

Display some useful informations and statistics

Commands are chainables so you can write:

# Perform all tasks from download to distibution
$ ./geozones.py download load -d aggregate postprocess dist dist -s status

Options

serialization

You can export data in (Geo)JSON or msgpack formats.

The msgpack format consumes more CPU on deserialization but does not take many gigabytes of RAM given that it can iterate over data without loading the whole file.

Reused datasets

Possible improvements

Build

  • Incremental downloads, maybe with checksum check
  • Global postprocessor
  • Postprocessor dependencies
  • Audit trail
  • Distribute GeoZone as a standalone python executable
  • Some quality check tools

Fields

  • Global weight = f(population, area, level)

Output

  • Different precision output
  • Localized JSON outputs (Output are english only right now)
  • Translations as distributable JSON (as an alternative to the current PO/MO format)
  • Translations as Python package
  • Model versionning
  • Statistics/coverages in levels

Web interface

  • Querying
  • Only fetch zones for viewport (less intensive for lower layers)
  • A full web-service as a separate project

About

Simple spatial/administrative referential

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 67.5%
  • CSS 16.5%
  • JavaScript 12.5%
  • HTML 3.5%