Skip to content

ioda-net/geo-pyprint

Repository files navigation

The goal of this service is to transform a print request containing the layers in a well defined format (WMTS, WMS, GeoJSON, …) and additional information of various type (date, QR code, …) to a document in a given output format, most likely PDF. The service must support the web api of MapFish Print v3.

The service must be able to use a template created by the user to generate the final document. This template must be in a form that is easy to edit and fast to process for the server.

For each type of layer we want to print, we need to be able to get an image with transparency. Without transparency, we would only see the image of the last layer. This may still happen if a layer is requested in a format that don't support transparency (eg JPEG).

We can then recombine the images with:

output = Image.new('RGBA', (1600, 532))
for resp in images:
     img = Image.open(resp.raw).convert('RGBA')
     output = Image.alpha_composite(output, img)

and saved with:

output.save('result.png', 'PNG')
my_map = BytesIO()
output.save(my_map, 'PNG')

Server Configuration

  • wsgi only?
  • Allow fast cgi?
  • Docker images?

Image layers

All requests are constructed with OWSLib or "manually" with the requests library.

WMS

Get the image with img = wms.getmap.

Requests should be built manually. OWSLib requires a valid GetCapability to work. That implies doing a request to get it and then parse it. These operations can be slow and are not not necessary since the user is giving a list of layers most-likely generated by a JS script. Thus it seems safe to assume that the layers exits, hence we don't need the GetCapability.

Problems to handle

  • If the requested image is too big for the WMS server (mostly for external WMS).

WMTS

Get the images with tile = wmts.gettile.

Problems to handle

  • How do we determine which tiles to get?
  • How do we recombine the tiles?
  • The images of recombined tiles is likely to be too big. How do we crop it correctly?

Vector Layers

WFS

  • Can we get images directly with the protocol?
  • If not, how do we get the vector and then how do we convert the vector to an image?

KML/Geojson

OGR formats can be converted to an image with mapnik. See this question on stackoverflow for an example.

You can also look at notebooks/mapnik.ipynb for examples of GeoJSON and KML rendering in python3.

Protected layers

Referer

Some layers require a specific value in the referer to be printed (WMTS layers for Swisstopo for instance). We should be able to forward some/all headers of a print request to the WMS/WMTS servers.

  • Do we have the possibility to filter out some headers (except host which must be removed)?

Basic auth

Forwarded as any other header. We must be able to forward this header only to a given list of domains.

Protected Print Server

Basic auth

The print server must be on the same domain as the WMS/WMTS server. This way, the browser will automatically forward the Authentication header to the print server. Authorization can then be handled with directly by the web server (Apache, nginx, …).

Other auth

  • Are there use cases for this?
  • Implement a user/group mechanism?

Templating

The user must also be able to dynamically add various element to it:

  • static image files
  • images from a URL
  • text
  • merge with another PDF (for instance to get the legend of a map, if the legend is pre-generated and in a PDF format).
  • use conditions to print an element or not.

Possible solutions:

  • ODT files with Jinja2 markup generated by Secretary. The odt must then be converted on the fly on the server by libre office to PDF.
    • Advantages
      • Powerful: anything that can be done in LibreOffice (tables, images, styles, …) and Jinja2 (loops, conditions, formatting, …)
      • Extensible: we can add our own filters/formatters.
      • Easy: the user can edit the template from a good WYSIWYG interface he/she probably already know.
      • Quite fast
    • Disadvantages
      • Requires LibreOffice on the server. To do this, use (once all running instance of LibreOffice are closed): libreoffice --convert-to pdf --outdir $(pwd) rendered_document.odt (takes ~0.6s). Other faster solution (~0.3s): start LibreOffice as a daemon libreoffice --accept='socket,host=127.0.0.1,port=2002;urp;StarOffice.NamingService' --headless, and use unoconv --connection 'socket,host=127.0.0.1,port=2220,tcpNoDelay=1;urp;StarOffice.ComponentContext' -f pdf rendered_document.odt
  • PDF templates with the relevant values replaced on the fly on the server. We can use PDFJinja: we rely on a PDF form containing Jinja2 markup and render the from in FDF format and create the PDF with PDFTK.
  • Write the document in a light markup language (rst, markdown, …), use Jinja2 or Mako to generate the full document, then convert it to PDF (rst2pdf).
    • Advantages
      • Can be easier to edit than HTML
      • Should be quite fast
      • With the good markup languages, as powerful as HTML
      • RST can do really go PDF with LaTeX but it requires LaTeX on the server.
    • Disadvantages
      • Not as easy as WYSIWYG for some people.

How could this work?

  1. The user makes a POST request at /print/<portalname>. The request must be in the JSON format and must respect the structure of MapFish Print requests.
  2. GPP loads the configuration for this portal.
  3. Process the layers:
    1. For each layers, create the request URL and make a GET at this address. This implies for:

      • WMS
        • layers: given in the request
        • style: not given, optional
        • srs: given in the request, under attributes.map
        • bbox: to calculate with paint area (in the configuration), scale (in the request) and map center (in the request)
        • size: to calculate with the BBOX (calculated), scale (in the request) and DPI (in the request)
        • format: given in the request
        • transparent: should be given with the request under the layer attribute customParams. If not present, consider it is set to true to allow layers below this one to be visible.
      • WMTS
        • requestEnconding: must be known before the request to the server, given in the request payload under the layer attribute requestEncoding.
        • matrixset: given in the request
        • tilematrix: How to determine that?
        • row: How to determine that?
        • column: How to determine that?
        • format: given in the request

      All requests are done in Python's event loop. The event loop (from the asyncio module) is an easy way to write single threaded concurrent code. This allows us to make all the requests in parallel so we don't have to wait for each one to complete before starting the next one.

      We should also use this moment to fetch distant images used in the template. To correctly identify these, they must be listed in the configuration.

    2. Apply rotation.
    3. Crop the images if necessary and merge them.
    4. Append the NorthArrow if requested.
  4. Load the proper template for portal and layout.
  5. Render the template.
  6. Return the rendered PDF.

This can be summed up by this schema:

image

Technology used

  • Python3 (>= 3.3)
  • Pyramid (web framework)
  • Mapnik 3: GeoJSON rendering
  • Requests: awesome Python library to make HTTP requests
  • Templating:
    • PDFTK with pdfjinja: use Jinja2 to render PDF form.
    • LibreOffice with Secretary: use Jinja2 to render an ODT document.

Configuration

The configuration should be written in the toml format to be both expressive and simple to write and read.

The configuration files must be in a folder named after the portal alongside the templates:

  • <portal>
    • configuration.toml
    • a4 portrait.pdf
    • a5 portrait.odt

Here is an example of what a configuration file may look like:

Questions

  • What format should we use? Is toml the best choice? JSON is not expressive enough (no comments allowed), YAML is often too complex and INI is limited.
  • Should we have the ability to define some value for the PDF (title, author, subject, keywords, …) or should it be left to the template?
  • Should we allow values to be overridden for each layout?
  • Should we impose to write a complete list of attributes and reject the request if some are missing or unknown?
  • Should we be support configuration from MFP?
  • Should we provide a conversion script to switch configuration from MFP to GPP?

Try it

This section will be updated once this project is released on Pypi or the Docker hub.

  1. Install unoconv binary from your distribution repository For Fedora RHEL yum install unoconv For openSUSE SLES zypper install unoconv For Debian Ubuntu apt-get install unoconv
  2. Clone this project: git clone https://github.com/ioda-net/geo-pyprint.git
  3. Move to the clone: cd geo-pyprint
  4. Create a new venv: virtualenv venv -p /usr/bin/python3 and activate it: source venv/bin/activate
  5. Install the dependencies in the virtualenv: ./setup.py develop.
  6. Install the development version of pdfjinja: pip install git+https://github.com/rammie/pdfjinja.git
  7. Install PDFTK
  8. Launch the app: pserve development.ini
  9. Test the app: curl -X POST -d @test-payload.json http://0.0.0.0:8383/print > output.pdf This should create a PDF file from the request.

Questions

  • Limit max number of request: globally, per domain, not at all?
  • Which output formats besides PDF should be supported?
  • Should the configuration be reloaded on each request or by a GET at /reload (is this worth the additional complexity)?

General Problems

  • Response with a non 200 status code:
    • Print failure?
    • Use a transparent layer or ignore the layer?

What should be tested?

  • Multiple EPSG with different units: meters, inches, degrees (lat/lon)
  • Requests with legends
  • Layers with errors:
    • Not found
    • No content
    • Non 200 status code
    • When to do a timeout?
  • Multiple EPSG on the same map. According to the request, a map has only one projection for all its layers.

Long term plans

  • Allow an async mode for the print requests: instead of waiting for the print request to complete before getting a response, we post the request and return with a print id. To get the status/progress of the print, the user makes a GET request to /print/portal/<id>. The response should be like:

    {
        "done": false,
        "id": "print-id",
        "progress": "X%"
    }

    When the request is done, the response should be like (failures associate the layer index with its error, layers with index in the failures object are not rendered):

    {
        "done": true,
        "failures": {
            "2": "404 not found",
            "5": "Timeout"
        }
    }

    It should be very close to what MapFish Print uses for status reports.

Appendix

PDF libraries for Python

  • ReportLab: PDF library for generating reports. More or less the Python equivalent of JasperReport. Recommended and used by many other project as a PDF engine. Also comes in a proprietary version which is claimed to be faster and have support for XML based templates.
  • PDFJinja: use PDF template and jinja2 to render a final document. Rely on pdftk to do the PDF rendering.
  • weasyprint: convert HTML/CSS to PDF with cairo.
  • pdfdocument: wrapper around ReportLab to make it easier to use.
  • xhtml2pdf: convert HTML pages to PDF with ReportLab. Support of Python3 is experimental.
  • wkhtmltopdf: relies on QtWebKit to render the page.

About

Python3 backend for printing templated cartographic maps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published