Skip to content

klpn/mortchartgen

Repository files navigation

#mortchartgen mortchartgen is a tool which is used to create charts of mortality trends for different countries, age groups and causes of death based on data from WHO Mortality Database. The tool uses Pandas and matplotlib to generate the charts and stores the data in a MySQL database. A YAML configuration file is used to specify the charts to be generated.

I use the tool to generate charts for the website Mortalitetsdiagram ("mortality charts"). Files for this site (excluding the SVG charts themselves) are included in the subdirectory mortchart-site. Currently, the generated charts are in Swedish. I am not affiliated with WHO, and they are not responsible for any interpretations of mortality trends based on charts generated from the tool. The tool is licensed under an ISC license.

##Setup It is assumed that you have a working Python setup, as well as access to a MySQL/MariaDB server, with a user privileged to create databases. The unzipped data files will require about 500 MB of disk space. The script download.py imports requests, shutil, os and zipfile, tableimp.py imports os and subprocess, and the main script, chartgen.py, imports sqlalchemy, numexpr, pandas, matplotlib, yaml, os, random, statsmodels and time.

  1. Run download.py [directory] in order to download the data files and documentation from the WHO website and unzip them into directory.
  2. Read the SQL file setupdb.sql into the MySQL client, e.g. mysql --defaults-extra-file=tableimp.cnf < setupdb.sql. This will create a database Morticd with two tables, Pop and Deaths, as well as a user whomuser with select rights granted on these tables, which is used for the SQL queries in the chart generator.You can use the provided file tableimp.cnf in this step and the next, as shown in the example, but then you have to adjust the relevant settings in the file (e.g. user, password, host and socket) in order for the database connection to work. For more information about the fields in the tables, consult the WHO documentation.
  3. Load the unzipped data files into the newly created tables. The file pop should be loaded into the table Pop, and the files with names starting with Mort should all be loaded into the table Deaths. The script tableimp.py loops through the data files and reads them into the tables using mysqlimport. You can call the script with tableimp.py [directory], where directory is the download directory specified in step 1. The default configuration is to read the files locally from the client, and this has to be supported by the MySQL server. Otherwise, move the files into a location where the server can read them directly and remove the local option in tableimp.cnf.
  4. Run tablemod.py. This stores tables of population and number of deaths (for the populations and cause-of-deaths groups specified in chartgen.yaml) in a SQLite database, chartgen.db. This speeds up the chart generation (see below) by avoiding repeated querying of the MySQL database with regular expressions. Some values in the dictionary conn_config (read from settings in chartgen.yaml) may also have to be changed in order for the database connection to work. In particular, you should change host and unix_socket to suit your MySQL server.

##Generate the charts Call the function batchplot in chartgen.py in order to generate the charts. This function is automatically called if chartgen.py is invoked from the system shell. The charts are saved as SVG files in the subdirectory mortchart-site/charts. If you want to skip certain countries, age groups or causes of death, comment out the relevant lines in chartgen.yaml. However, the cause all cannot be excluded, because it is used to compute percentage of total deaths for other causes.

##CSV generation If savecsv under settings in chartgen.yaml is true, chartgen.py will save the dataframes used to generate the charts as CSV files in the subdirectory csv, so that they can be further analysed in other programs.

##Special charts with R The R script specchartgen.r demonstrates how the generated CSV files can be used. It contains the functions agetrends.plot which generates charts showing secular trends for a given combination of sex, cause and a interval of 5-year age groups, sexratio.trends.plot which generates charts showing secular trends for sex ratios for mortality rates/percentages, and ctrisyear.plot which generates charts giving a comparison of mortality between countries for a given cause and year. It can generate scatterplots of female vs male mortality or bar charts for a single sex. The function ctriesyr.batchplot uses ctrisyear.plot to generate charts for all causes and age groups in chartgen.yaml and for all years in a given sequence and export these as SVG files in the subdirectory mortchart-site/charts/ctriesyr. The function causedist.plot generates charts of the age-specific distribution of causes of death for a given country, sex and year. By default, the list of causes is read from causedist in chartgen.yaml.

All charts are generated using ggplot2, and the script also uses the packages tidyr, yaml, XML, gridSVG, plyr and rjson.

The function lmortfunc.test in specchartgen.r can be used to perform so-called Gompertzian analysis of mortality trends. By calling the function paramsplot in mortparams.py, results with parameters can be plotted using the TeX facilities in matplotlib.

In additions to packages used by chartgen.py, mortparams.py imports rpy2 for communication with R. The model is fitted with Levenberg-Marquardt nonlinear least-squares (using minpack.lm). If lmortfunc.test is called with mortfunc = 'weibull', the mortality data is fitted to the two-parameter Weibull function instead of the Gompertz function (cf. Juckett and Rosenberg (1993)). It is also possible to fit survivorship curves, for the subpopulation who dies of a particular cause, instead of mortality curves, if lmortfunc.test is called with type = 'surv'. Fit of mortality curves corresponding to these survival curves (i.e. normalized to the fraction dying of the given cause) can be obtained by calling the function with type = rate (the default) and normrate = TRUE. For this normalization, life tables are constructed using LifeTables.

By calling lmortfunc.test with pc = 'p' or pc = 'c', analysis can be fitted by period or birth cohort: the latter is only implemented for unnormalized mortality curves, however.

By calling obspred_plot in mortparams.py on an object returned by paramsplot it is possible to plot observed data for a list of years versus the predictions made by the non-linear regressions.

##Generate the index page and documentation source By running mortchart-site/indexgen.py you can generate index.html and mortchartdoc_norefhead.md in mortchart_site based on the settings in chartgen.yaml and the templates index.jinja and mortchartdoc.jinja in mortchart-site/jinjatempl(which use Jinja2). The first file contains a bare form, which you can use to search among the charts in a web browser, and the second file can be used to generate the site documentation in PDF or HTML format.

##Generate docs Run make pdfbib in mortchart-site in order to generate PDF documentation from the Markdown source. This requires a LaTeX distribution as well as Pandoc (in order to convert Markdown). The HTML documentation is generated automatically when the site is built (see below).

##Generate the Mortchart site The full site is now generated using Hakyll, a static site generator which is tightly integrated with Pandoc and uses the Haskell compiler GHC. To generate the site for the first time, run make buildinit in the directory mortchart-site (it will be generated in mortchart-site/_site). The program assumes that the charts (both those made by chartgen.py and those made by ctriesyr.batchplot in specchartgen.r) have been generated. To update the site, run make build; if you modify site.hs, update with make rebuild.

About

Generates charts with cause-specific mortality trends.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published