Skip to content

zacharlie/qgis_dataset_qa_workbench

 
 

Repository files navigation

QGIS Dataset QA Workbench

github-pages

A QGIS3 plugin for assisting in dataset quality assurance workflows.

This plugin allows loading checklists with steps that should be verified when doing dataset quality assurance (QA). Checklist items can be automated by using QGIS Processing algorithms

Table of contents

  1. Installation

  2. Quickstart

    1. Add sample checklist repository
    2. Choose checklist
    3. Perform validation
    4. Generate report
  3. Creating new checklists

  4. Sharing checklists and Processing algorithms with other users

  5. Development

  6. Attribution

Installation

This plugin can be installed directly by QGIS. It is published in the official QGIS plugins repository. Use the QGIS plugin manager (navigate to Plugins -> Manage and Install Plugins...) and search for Dataset QA Workbench. Then install the plugin.

NOTE: Be sure to have the Show also experimental plugins checkbox checked, in the plugin manager Settings section.

We recommend also installing the QGIS Resource Sharing plugin, since it is able to share the checklists that are required by our plugin.

Quickstart

Add sample checklist repository

After having installed both this plugin and the QGIS Resource Sharing plugin, add a new repo to QGIS Resource Sharing:

  1. Navigate to Plugins -> Resource Sharing -> Resource Sharing

  2. On the QGIS Resource Sharing dialog, go to Settings -> Add repository...

  3. Add the following repository:

  4. The new QGIS Dataset QA Workbench demo repository shall now be displayed by the Resource Sharing plugin

  5. Navigate to the All collections section and look for an entry named QGIS Dataset QA Workbench demo

  6. Press the Install button. The Resource Sharing plugin proceeds to download and install some sample checklists to the {qgis-user-profile-dir}/checklists directory.

Choose checklist to perform validation with

  1. Open the QGIS Dataset QA Workbench dock (navigate to Plugins -> Dataset QA Workbench -> Dataset QA Workbench or click the plugin's icon ) and navigate to the Choose Checklist tab

  2. Inside the plugin dock, navigate to Choose Checklist -> Choose....

    In the dialog that opens, select one of the existing checklists. Take into account the dataset type that it is applicable to (document, vector or raster) and the artifact that it applies to (dataset, metadata or style). Click the OK button to close this dialog. The checklist is loaded and is ready to use.

  3. Depending on the loaded checklist and its dataset and artifact types, select if you want to:

    • a) Validate one of the currently loaded layers. If so, be sure to select it on the list of layers shown in the plugin dock

    • b) Validate an external file, by indicating its path on the local filesystem

    Upon selecting one of these options, both the Perform Validation and Generate Report tabs become selectable

Perform validation of a resource

Move over to the Perform Validation tab where you are presented with a list of checklist steps to be validated. For each of these checks:

  1. Read the description in order to understand what the current check is about

  2. A checklist check can be validated in one of two ways:

    • Manually - Follow the instructions provided by the guide section. These should be detailed and practical enough in order to allow you to properly validate the current checklist check.

    • Automatically - If applicable, the validation may be performed by pressing one of the two buttons present on the automation section.

      • Run - Perform validation by using whatever predefined parameters have been used by the checklist's designer

      • Configure and run... - Configure the check's validation parameters and then run the validation procedure

  3. After performing validation, you may optionally click the Validation notes section and type down any relevant notes about the process.

Generate validation report

After having validated all of the checklist's checks, move over to the Generate Report tab. This tab displays a summary of the validation process, with information related to:

  • the dataset being validated
  • the overall validation result
  • result of each check
  1. Customize the report's Validated by field

    By default, the report uses whatever value is automatically generated by QGIS in its global user_full_name variable. If you wish to provide a different name:

    1. Navigate to Settings -> Options... -> Variables

    2. Define a new variable named dataset_qa_workbench

    3. Set an appropriate value. The plugin will use it as the author of validation reports

  2. If applicable, the checklist may specify a post-validation action. In this case, the Run post validation and Configure and run post validation... buttons will be enabled.

    Post validation actions may be used for providing confirmation of the validation procedure to some third-party. Some examples include emailing the validation report to a list of recipients or POSTing the report to some centralized host by using a suitable REST API

  3. When checklists are designed to automatically share output reports, additional variables must be configured within QGIS in order for the reports to be shared effectively, outlined as follows:

    • Report poster: Sends the report to a remote host using an http POST

      • dataset_qa_workbench_auth_config_id (optional): the QGIS AuthID, as configured with the QGIS authentication manager and linked to the current user profile, represented as a string value, e.g. 'qauth01', and used to authenticate with the remote host (where required).
      • dataset_qa_workbench_endpoint: the REST endpoint URL, represented as a string value, e.g. 'https://service.example.com/REST'.
    • Report mailer: Sends the report to recipients via email

      • dataset_qa_workbench_sender_address: email address of the sender, used to authenticate with the mail server and given as a string value, e.g. 'noreply@example.com'
      • dataset_qa_workbench_sender_password: sender address password for mailserver, used to authenticate with the mail server and given as a string value, e.g. 'S3cret'
      • dataset_qa_workbench_recipients: list of intended recipients, given as a single comma separated string, e.g. 'user01@example.com,user02@example.com,user03@example.com'
      • dataset_qa_workbench_smtp_host: SMTP mailserver host address, default 'smtp.gmail.com'
      • dataset_qa_workbench_smtp_port: SMTP port number as an integer, default 587
      • dataset_qa_workbench_smtp_secure_connection: a string value which describes the mail server connection security type. Valid values are 'starttls' (default) and 'ssl'. Use a blank string, '', to enforce no security policy (i.e. connect over http)

    Note that all of these elements may be configured globally for the current QGIS user profile under the menu item for Settings -> Options... -> Variables.

  4. If you are validating a loaded layer, the Add validation report to layer metadata button will be enabled. In this case you have the option to include the validation report in the layer's metadata. This modifies existing layer metadata in two ways:

    • The full validation report is appended to the end of the metadata's Abstract field. Note that additional presses of the Add validation report to layer metadata button cause the new report to be appended to whatever was already written on the Abstract field (including any previous validation reports that might be there)

    • A new line is also appended to the metadata's History section. This include's the validation report's generation timestamp and the overall validation result

  5. Finally, the report may also be saved as a PDF file by selecting an appropriate destination path in the Save validation report to text box and then pressing the Save button

Creating new checklists

Checklists are stored locally on the QGIS user profile directory (accessible from QGIS by navigating to Settings -> User Profiles -> Open Active Profile Folder...) under the checklists directory.

Installing a new checklist is simply a matter of placing a suitable file in this directory.

Checklists are stored in json format. They are defined as JSON objects and must conform to a predefined checklist schema. Example checklist definition:

{
  "name": "Sample checklist with action for emailing validation report",
  "description": "This is just a sample checklist - be sure to delete it\n\nThis also demonstrates sending emails with the report of validation",
  "dataset_type": "vector",
  "validation_artifact_type": "dataset",
  "checks": [
    {
      "name": "geometry is valid",
      "description": "Layer's geometry does not have invalid geometries.",
      "guide": "Navigate to Vector -> Geometry tools -> Check Validity... and run the validity analysis tool. Afterwards check that there are no features on the `invalid output` layer",
      "automation": {
        "algorithm_id": "qgis:checkvalidity",
        "artifact_parameter_name": "INPUT_LAYER",
        "output_name": "INVALID_COUNT",
        "negate_output": true
      }
    },
    {
      "name": "CRS is EPSG:4326",
      "description": "Layer's Coordinate Reference System is lat-lon on WGS84 datum (i.e. EPSG code 4326)",
      "guide": "Open the layer properties dialog, then navigate to the 'information' tab (should be the first one) and in the section called 'Information from provider' check if the 'CRS' field has a value of 'EPSG:4326 - WGS84 - Geographic'",
      "automation": {
        "algorithm_id": "dataset_qa_workbench:crschecker",
        "artifact_parameter_name": "INPUT_LAYER",
        "output_name": "OUTPUT",
        "negate_output": false,
        "extra_parameters": {
          "INPUT_CRS": "EPSG:4326"
        }
      }
    }
  ],
  "report": {
    "algorithm_id": "dataset_qa_workbench:reportmailer"
  }
}

This plugin's code repository also features a collection of sample checklists that may be studied in order to get a better grasp on how to define new checklists.

Each checklist has the following mandatory properties:

  • name - Name of the checklist. This is used as the checklist identifier in the QGIS UI, therefore a checklist's name must be unique;

  • description - A short text explaining what the checklist is about;

  • dataset_type - The type of dataset that this checklist operates on. It must be one of:

    • document;
    • raster;
    • vector.
  • validation_artifact_type - The type of artifact that this checklist operates on. It must be one of:

    • dataset;
    • metadata;
    • style.

A checklist may also have the following optional properties:

  • checks - A checklist may have a list of checks, that describe each individual validation step.

    • Each checklist check is defined as a JSON object. It has the following mandatory properties:

      • name - Name of the checklist step;
      • description - A short description explaining what the check is about;
      • guide - Small text specifying how a human operator might go ahead and validate this check.
    • A checklist check may also have the following optional properties:

      • automation - A JSON object that contains the configuration for the automated execution of this validation check.

        The automation object has the following mandatory properties:

        • algorithm_id - Identifier of the QGIS Processing algorithm used to perform automation. It takes the form provider:algorithm (e.g. qgis:checkvalidity). It can be retrieved from the QGIS Processing toolbox by resting the mouse pointer on top of the desired algorithm

        The automation object may have the following optional properties:

        • artifact_parameter_name - Name of the algorithm parameter that specifies which of the algorithm's parameters represents the artifact currently being validated (e.g. INPUT, INPUT_LAYER). If not specified, it will default to INPUT_LAYER - This value may be retrieved from the Processing algorithm by opening up the algorithm’s dialog and resting the mouse pointer on the relevant input;

        • output_name - Name of the algorithm parameter that specifies which one of the algorithm's outputs holds the result of the validation. If not specified, it will default to OUTPUT. This value may also be retrieved from the algorithms dialog, in a similar fashion as the artifact_parameter_name property;

        • negate_output - Whether to interpret a falsy result coming from the processing algorithm as a sign of success in the validation. This is sometimes desirable. Example: the qgis:checkvalidity algorithm will return zero invalid features if a layer does not have any invalid geometries. In this case, the zero must be interpreted as a successful validation;

        • extra_parameters - Any additional parameters necessary for running the processing algorithm. These may be used for configuring other stuff related to the algorithm, such as the geometry validity method (in the case of the qgis:checkvalidity algorithm). They are passed straight to Processing. This property must be a JSON object.

  • report - A JSON object with configuration of for a post validation action. This action is implemented by means of an additional Processing algorithm, which is fed the generated validation report as an input, was well as any other parameters that may be needed. A report has the following mandatory properties:

    • algorithm_id - Identifier of the QGIS Processing algorithm (or model) used to perform the post validation. It is specified in a similar way as the check.algorithm_id property, specified above

    A report may also have the following optional properties:

    • extra_parameters - Any additional parameters necessary for running the processing algorithm. These may be used for configuring other stuff related to the algorithm and have a similar description as the one mentioned above for the check.extra_parameters property

Processing algorithms suitable for use in checklist steps

In order to be suitable for use as an automated validation step, a Processing algorithm must define some output that can be used for attesting whether the step succeeds or not. This means that it must be possible to convert the output to a True/False value.

Most default QGIS Processing algorithms simply output a map layer with their results. These are not suitable for use in automated checklist validation steps as there is no clear way to determine the success condition for the validation. Other algorithms, like the qgis:checkvalidity algorithm, output both map layers and suitable numeric output results. These are suitable for using for automated validation.

The QGIS Dataset QA Workbench algorithm provides some custom Processing algorithms that are specifically tailored for automated validation. These are also likely to increase in number in the future, as new version of the plugin are released. The current list of algorithms is:

  • dataset_qa_workbench:crschecker - Allows checking if a layer's CRS matches an expected value

  • dataset_qa_workbench:xmlchecker - Allows checking if an XML file has the specified elements/attributes/values

You may also design your own custom Processing algorithms and then distribute them via the QGIS Resource Sharing plugin so that users can use them together with checklists from QGIS Dataset QA Workbench plugin.

Processing algorithms suitable for use in post validation actions

In a similar fashion to the algorithms used for automating validation, the Processing algorithms that can be used to execute post validation actions also have specific requirements. It is not possible to reuse the standard QGIS Processing algorithms for this purpose. This plugin also ships with some suitable post validation algorithms, and it is also likely that future releases will expand on the list. Current algorithms suitable for post validation actions:

  • qa_dataset_workbench:reportmailer - Allows sending a copy of the validation report via email

  • qa_dataset_workbench:reportposter - Allows posting the validation report to an online server that implements a suitable REST API.

Validating custom checklists

The structure of checklists is formally defined in a [JSON Schema] file that is part of the plugin’s source code. This file can be inspected at:

https://raw.githubusercontent.com/kartoza/qgis_dataset_qa_workbench/master/schemas/checklist-check.json

JSON Schema files can be used to validate that a specific json object validates the schema. As such, it is desirable that every custom checklist be validated against the schema in order to ensure it works with the QGIS Dataset QA Workbench plugin.

This validation may be done by either:

  • using the jsonschema python package. Example:

    # validating checklist with local jsonschema package
    pipenv run jsonschema -i checklist-file.json schemas/checklist-check.json
    
  • using any online json schema validator, such as the one available at:

    https://www.jsonschemavalidator.net/

  • Your IDE might also support validating json files with a json schema.

Sharing checklists and Processing algorithms with other users

This plugin leans on the capabilities provided by the QGIS Resource Sharing plugin and thus users are able to leverage that to share:

  • Checklists
  • Processing algorithms
  • Processing models

We recommend setting up a git repository with the following structure:

my-shareable-qgis-resources/
  metadata.ini
  collections/
    my-shareable-collection/
      checklists/
        checklist1.json
        checklist2.json
      processing/
        algorithm1.py
        algorithm2.py
      models/
        model1.model3
        model2.model3

Then put all checklists, algorithms, etc that are to be shared in there and simply provide the git repo's URL to your users. Consult the documentation of the QGIS Resource Sharing plugin for further information on this.

Development

This plugin uses poetry and typer for development.

An easy way to get started is to (fork and) clone this repo, install poetry and install it!

sudo apt install pyqt5-dev-tools

poetry install

Installing locally

Call the install task:

poetry run python pluginadmin.py install

Running tests

poetry shell
cd ~/.local/share/QGIS/QGIS3/profiles/default/python/plugins
python -m pytest -v -x ~/dev/qgis_dataset_qa_workbench

Attribution

This plugin uses icons from the Font Awesome project. Icons are used as-is, without any modification, in accordance with their license.

About

A QGIS3 plugin for assisting in dataset Quality Assurance workflows

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 85.9%
  • Makefile 8.7%
  • Batchfile 3.0%
  • Shell 1.8%
  • HTML 0.6%