Skip to content

eea/eea.odpckan

Repository files navigation

ODP CKAN - EU Open Data Portal CKAN client

  • read messages from the RabbitMQ service
  • interrogate SDS and retrieve full data about the specified datasets in JSON format
  • updates the EU Open Data Portal (ODP) using CKAN API

Start the odpckan client with the following command:

$ sudo docker run -d \
                  -e RABBITMQ_HOST=http://rabbitmq.apps.eea.europa.eu \
                  -e RABBITMQ_PORT=5672 \
                  -e RABBITMQ_USERNAME=client \
                  -e RABBITMQ_PASSWORD=secret \
                  -e CKAN_ADDRESS=https://open-data.europa.eu/en/data \
                  -e CKAN_APIKEY=secret-api-key \
                  -e SERVICES_SDS=http://semantic.eea.europa.eu/sparql \
                  -e SDS_TIMEOUT=60 \
                  -e CKANCLIENT_INTERVAL="0 */3 * * *" \
                  -e CKANCLIENT_INTERVAL_BULK="0 0 * * 0" \
                  -e  eeacms/odpckan

For development, a docker-compose.yml file is provided. To set extra environment variables, copy docker-compose.override-example.yml to docker-compose.override.yml and customize it.

Dependencies

  • Pika a python client for RabbitMQ
  • ckanapi a python client for CKAN API to work with ODP
  • rdflib a python library for working with RDF
  • rdflib-jsonld JSON-LD parser and serializer plugins for RDFLib

Clone the repository:

$ git clone https://github.com/eea/eea.odpckan.git
$ cd eea.odpckan

Install all dependencies with pip command:

$ pip install -r requirements.txt

ODP CKAN entry point that will start consume all the messages from the queue and stops after. This command can be setup as a cron job.:

$ python app/ckanclient.py -d
$ #debug mode: creates debug files for dataset data from SDS and ODP, before and after the update

$ python app/ckanclient.py
$ #default/working mode: reads and process all messages from specified queue

Inject test messages (default howmany = 1):

$ python app/proxy.py howmany

Query SDS (default url = https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-1) and print result:

$ python app/sdsclient.py -d
$ #debug mode: queries SDS and dumps a dataset and all datasets

$ python app/sdsclient.py
$ #default/working mode: initiate the bulk update

Information published on EEA main portal is submitted to the EU Open Data Portal.

https://raw.githubusercontent.com/eea/eea.odpckan/master/docs/EEA%20ODP%20CKAN%20-%20swimlane%20workflow%20diagram.png

The workflow is described below:

  • EEA CMS (Plone)
    • content is published
    • CMS content rules are triggered and the following operations are performed:
  • EEA ODP CKAN client
    • CKAN client is triggered periodically via a cron job
    • CKAN client connect to RabbitMQ message broker and consumes all the messages from the “odp_queue” queue performing following operations:
      • dataset is identified
      • dataset’s metadata is extracted from SDS
      • using CKAN API, OPD is updated
      • if issues occur during message processing the message is re queued
  • EEA ODP CKAN client (bulk update operation)
    • is triggered periodically via a cron job
    • it reads all the datasets from the SDS
    • generates update messages in the RabbitMQ message broker, one message per dataset found

Message:

$ update|https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-1 |eea-coastline-for-analysis-1

Message structure:

$ action|url|identifier

The "identifier" value is ignored, only the URL is used to look up the dataset in SDS.

Action(s):

$ create/update/delete

From the "app" directory, install development requirements, and run pytest:

pip install -r requirements-dev.txt
pytest

The tests use pre-recorded responses for SDS queries. To update the responses, run the tests in "spy" mode:

SDS_MOCK_SPY=true pytest

The Initial Owner of the Original Code is European Environment Agency (EEA). All Rights Reserved.

The Original Code is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

European Environment Agency (EU)