Server Configuration

The goal of this service is to transform a print request containing the layers in a well defined format (WMTS, WMS, GeoJSON, …) and additional information of various type (date, QR code, …) to a document in a given output format, most likely PDF. The service must support the web api of MapFish Print v3.

The service must be able to use a template created by the user to generate the final document. This template must be in a form that is easy to edit and fast to process for the server.

For each type of layer we want to print, we need to be able to get an image with transparency. Without transparency, we would only see the image of the last layer. This may still happen if a layer is requested in a format that don't support transparency (eg JPEG).

We can then recombine the images with:

output = Image.new('RGBA', (1600, 532))
for resp in images:
     img = Image.open(resp.raw).convert('RGBA')
     output = Image.alpha_composite(output, img)

and saved with:

output.save('result.png', 'PNG')
my_map = BytesIO()
output.save(my_map, 'PNG')

Server Configuration

wsgi only?
Allow fast cgi?
Docker images?

Image layers

All requests are constructed with OWSLib or "manually" with the requests library.

WMS

~~Get the image with img = wms.getmap.~~

Requests should be built manually. OWSLib requires a valid GetCapability to work. That implies doing a request to get it and then parse it. These operations can be slow and are not not necessary since the user is giving a list of layers most-likely generated by a JS script. Thus it seems safe to assume that the layers exits, hence we don't need the GetCapability.

Problems to handle

If the requested image is too big for the WMS server (mostly for external WMS).

WMTS

Get the images with tile = wmts.gettile.

Problems to handle

How do we determine which tiles to get?
How do we recombine the tiles?
The images of recombined tiles is likely to be too big. How do we crop it correctly?

Vector Layers

WFS

Can we get images directly with the protocol?
If not, how do we get the vector and then how do we convert the vector to an image?

KML/Geojson

OGR formats can be converted to an image with mapnik. See this question on stackoverflow for an example.

You can also look at notebooks/mapnik.ipynb for examples of GeoJSON and KML rendering in python3.

Protected layers

Referer

Some layers require a specific value in the referer to be printed (WMTS layers for Swisstopo for instance). We should be able to forward some/all headers of a print request to the WMS/WMTS servers.

Do we have the possibility to filter out some headers (except host which must be removed)?

Basic auth

Forwarded as any other header. We must be able to forward this header only to a given list of domains.

Protected Print Server

Basic auth

The print server must be on the same domain as the WMS/WMTS server. This way, the browser will automatically forward the Authentication header to the print server. Authorization can then be handled with directly by the web server (Apache, nginx, …).

Other auth

Are there use cases for this?
Implement a user/group mechanism?

Templating

The user must also be able to dynamically add various element to it:

static image files
images from a URL
text
merge with another PDF (for instance to get the legend of a map, if the legend is pre-generated and in a PDF format).
use conditions to print an element or not.

Possible solutions:

ODT files with Jinja2 markup generated by Secretary. The odt must then be converted on the fly on the server by libre office to PDF.
- Advantages
  - Powerful: anything that can be done in LibreOffice (tables, images, styles, …) and Jinja2 (loops, conditions, formatting, …)
  - Extensible: we can add our own filters/formatters.
  - Easy: the user can edit the template from a good WYSIWYG interface he/she probably already know.
  - Quite fast
- Disadvantages
  - Requires LibreOffice on the server. To do this, use (once all running instance of LibreOffice are closed): libreoffice --convert-to pdf --outdir $(pwd) rendered_document.odt (takes ~0.6s). Other faster solution (~0.3s): start LibreOffice as a daemon libreoffice --accept='socket,host=127.0.0.1,port=2002;urp;StarOffice.NamingService' --headless, and use unoconv --connection 'socket,host=127.0.0.1,port=2220,tcpNoDelay=1;urp;StarOffice.ComponentContext' -f pdf rendered_document.odt
PDF templates with the relevant values replaced on the fly on the server. We can use PDFJinja: we rely on a PDF form containing Jinja2 markup and render the from in FDF format and create the PDF with PDFTK.
Write the document in a light markup language (rst, markdown, …), use Jinja2 or Mako to generate the full document, then convert it to PDF (rst2pdf).
- Advantages
  - Can be easier to edit than HTML
  - Should be quite fast
  - With the good markup languages, as powerful as HTML
  - RST can do really go PDF with LaTeX but it requires LaTeX on the server.
- Disadvantages
  - Not as easy as WYSIWYG for some people.

How could this work?

The user makes a POST request at /print/<portalname>. The request must be in the JSON format and must respect the structure of MapFish Print requests.
GPP loads the configuration for this portal.
Process the layers:
1. For each layers, create the request URL and make a GET at this address. This implies for:
  - WMS
    - layers: given in the request
    - style: not given, optional
    - srs: given in the request, under attributes.map
    - bbox: to calculate with paint area (in the configuration), scale (in the request) and map center (in the request)
    - size: to calculate with the BBOX (calculated), scale (in the request) and DPI (in the request)
    - format: given in the request
    - transparent: should be given with the request under the layer attribute customParams. If not present, consider it is set to true to allow layers below this one to be visible.
  - WMTS
    - requestEnconding: must be known before the request to the server, given in the request payload under the layer attribute requestEncoding.
    - matrixset: given in the request
    - tilematrix: How to determine that?
    - row: How to determine that?
    - column: How to determine that?
    - format: given in the request
  All requests are done in Python's event loop. The event loop (from the asyncio module) is an easy way to write single threaded concurrent code. This allows us to make all the requests in parallel so we don't have to wait for each one to complete before starting the next one.
  
  We should also use this moment to fetch distant images used in the template. To correctly identify these, they must be listed in the configuration.
2. Apply rotation.
3. Crop the images if necessary and merge them.
4. Append the NorthArrow if requested.
Load the proper template for portal and layout.
Render the template.
Return the rendered PDF.

This can be summed up by this schema:

Technology used

Python3 (>= 3.3)
Pyramid (web framework)
Mapnik 3: GeoJSON rendering
Requests: awesome Python library to make HTTP requests
Templating:
- PDFTK with pdfjinja: use Jinja2 to render PDF form.
- LibreOffice with Secretary: use Jinja2 to render an ODT document.

Configuration

The configuration should be written in the toml format to be both expressive and simple to write and read.

The configuration files must be in a folder named after the portal alongside the templates:

<portal>
- configuration.toml
- a4 portrait.pdf
- a5 portrait.odt

Here is an example of what a configuration file may look like:

Questions

What format should we use? Is toml the best choice? JSON is not expressive enough (no comments allowed), YAML is often too complex and INI is limited.
Should we have the ability to define some value for the PDF (title, author, subject, keywords, …) or should it be left to the template?
Should we allow values to be overridden for each layout?
Should we impose to write a complete list of attributes and reject the request if some are missing or unknown?
Should we be support configuration from MFP?
Should we provide a conversion script to switch configuration from MFP to GPP?

Try it

This section will be updated once this project is released on Pypi or the Docker hub.

Install unoconv binary from your distribution repository For Fedora RHEL yum install unoconv For openSUSE SLES zypper install unoconv For Debian Ubuntu apt-get install unoconv
Clone this project: git clone https://github.com/ioda-net/geo-pyprint.git
Move to the clone: cd geo-pyprint
Create a new venv: virtualenv venv -p /usr/bin/python3 and activate it: source venv/bin/activate
Install the dependencies in the virtualenv: ./setup.py develop.
Install the development version of pdfjinja: pip install git+https://github.com/rammie/pdfjinja.git
Install PDFTK
Launch the app: pserve development.ini
Test the app: curl -X POST -d @test-payload.json http://0.0.0.0:8383/print > output.pdf This should create a PDF file from the request.

Questions

Limit max number of request: globally, per domain, not at all?
Which output formats besides PDF should be supported?
Should the configuration be reloaded on each request or by a GET at /reload (is this worth the additional complexity)?

General Problems

Response with a non 200 status code:
- Print failure?
- Use a transparent layer or ignore the layer?

What should be tested?

Multiple EPSG with different units: meters, inches, degrees (lat/lon)
Requests with legends
Layers with errors:
- Not found
- No content
- Non 200 status code
- When to do a timeout?

~~Multiple EPSG on the same map.~~ According to the request, a map has only one projection for all its layers.

Long term plans

Allow an async mode for the print requests: instead of waiting for the print request to complete before getting a response, we post the request and return with a print id. To get the status/progress of the print, the user makes a GET request to /print/portal/<id>. The response should be like:
```
{
    "done": false,
    "id": "print-id",
    "progress": "X%"
}
```
When the request is done, the response should be like (failures associate the layer index with its error, layers with index in the failures object are not rendered):
```
{
    "done": true,
    "failures": {
        "2": "404 not found",
        "5": "Timeout"
    }
}
```
It should be very close to what MapFish Print uses for status reports.

Appendix

PDF libraries for Python

ReportLab: PDF library for generating reports. More or less the Python equivalent of JasperReport. Recommended and used by many other project as a PDF engine. Also comes in a proprietary version which is claimed to be faster and have support for XML based templates.
PDFJinja: use PDF template and jinja2 to render a final document. Rely on pdftk to do the PDF rendering.
weasyprint: convert HTML/CSS to PDF with cairo.
pdfdocument: wrapper around ReportLab to make it easier to use.
xhtml2pdf: convert HTML pages to PDF with ReportLab. Support of Python3 is experimental.
wkhtmltopdf: relies on QtWebKit to render the page.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
doc		doc
geopyprint		geopyprint
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
development.ini		development.ini
pdfjinja-template.odt		pdfjinja-template.odt
pdfjinja-template.pdf		pdfjinja-template.pdf
production.ini		production.ini
setup.py		setup.py
template.odt		template.odt
test-payload.json		test-payload.json

License

ioda-net/geo-pyprint

Folders and files

Latest commit

History

Repository files navigation

Server Configuration

Image layers

WMS

Problems to handle

WMTS

Problems to handle

Vector Layers

WFS

KML/Geojson

Protected layers

Referer

Basic auth

Protected Print Server

Basic auth

Other auth

Templating

How could this work?

Technology used

Configuration

Questions

Try it

Questions

General Problems

What should be tested?

Long term plans

Appendix

PDF libraries for Python

About

Resources

License

Stars

Watchers

Forks

Languages