Skip to content

magreiner/MMLP

Repository files navigation

Medical Machine Learning Platform

General Information

This repository contains the prototypical implementation of my Masterthesis in 2019 at the University of Heidelberg in the medical context of the German Cancer Research Center (DKFZ):

Platform to Assist Medical Experts in Training, Application, and Control of Machine Learning Models Using Patient Data from a Clinical Information System

The full thesis and further information are published in the Heidelberg Document Repository: HeiDOKs.

Further changes and developments after July 2019 were performed independently from the Heidelberg University and the DKFZ.

Platform Data Storage Structure

All data objects are stored using UUIDs to avoid conflicts with similar objects.

  • /data is per default created and mounted to stores the data of the platform.
  • /data/MMLP/models contains all data related to models, including training snapshots.
  • /data/MMLP/datasets contains all data sets uploaded by the user.
  • /data/MMLP/results if a user uploads data and applies a model, the resulting predictions are stored here.

The configuration is part of the backend, check backend/README and backend/mmlp/config.py

System Requirements

This prototypical platform implementation does support on-premise, hybrid, and public clouds. It is tested on Amazon Web Services, Microsoft Azure, and Google Cloud. In case you need assistance, please contact me.

Before you attempt to deploy the platform, please ensure your system meets the following requirements:

  1. Docker is installed
  2. GPU support is available within docker (if you run machine learning on GPU) For Nvidia GPUS: https://github.com/NVIDIA/nvidia-docker For AMD GPUs: https://rocm.github.io/
  3. If you do not update the default configuration, the following settings are assumed: The global folder /data is used to store all kinds of data related to the platform; it could consume a lot of disk space, depending on your model and data set. If you use a distributed computing environment, please ensure this folder is appropriately shared between the computing nodes. Note: Currently, distribution and scaling are planned but not yet implemented. Please contact me for further information.

Usage

  1. Clone the repository:
git clone https://github.com/magreiner/MMLP
cd MMLP
  1. Adjust the configuration
vi backend/mmlp/Config.py
  1. Deploy the platform The platform can be deployed using docker-compose:
# build the containers (repeat this step every time you changed the code or the configuration)
docker-compose build --parallel

# foreground deployment (useful for development, showing the logs directly):
docker-compose up

# background deployment as service (access logs via docker-compose -f logs)
docker-compose up -d
  1. Enjoy If deployed locally you can access the platform on port 80 with http://localhost

Note:

  • https is not activated by default, due to the increased complexity with the certificates. To create certificates Letsencrypt is recommended.
  • Sometimes, the browser tries to switch to https automatically and fails. If the platform is not showing as expected, check your browser.

Screenshots

Clinical Data Scientist (Developer) View

  1. Welcome Page Welcome page

  2. Option to Switch Between Clinical Data Scientist (Developer) View and Medical Expert (User) View Welcome Page

  3. Data Set Overview Data Set Overview

  4. Data Set Version Overview Data Set Version Overview

  5. Model Overview Model Overview

  6. Model Version Overview (Commits, ...) Model Version Overview

  7. Snapshots for a particular Model Version Overview Snapshots for a particular Model Version Overview

  8. Training Pipeline, please be aware that these pages have dynamic content based on the used model. Therefore this view can vary greatly, depending on the functionality of the used model. Due to copyright, no model is currently included in this prototype.

    1. Select the data set for training Data Set Selection

    2. Select a version of the data set for training Data Set Version Selection

    3. Verify the selection Data Set Summary

    4. Select the model for training Model Selection

    5. Select the version (commit-id) for training Model Version Selection

    6. Verify the selection Model Summary

    7. Select a training snapshot for fine-tuning, or create a new training snapshot Snapshot Selection

    8. Verify the selection Snapshot Summary

    9. Customize the training settings, such as pre-processing and hyper-parameters Training Customization

    10. Verify all settings before deployment Configuration Summary

  1. Method Overview (A method represents a model snapshot, that is exported and made available to a medical expert. It can be used without further configuration) Methid Overview

  2. Result View: An overview of the results of the application of a method by the user. This is intended to allow further debugging by the clinical data scientist. Result Overview

Medical Expert (User without machine learning experience) View

  1. Welcome Page Welcome page

  2. Method Overview Method Overview

  3. Analyzing New Data (Use the pretrained model to predict on new data) Guided Pipeline Upload Patient Cohort Method Selection Summary

  4. Result View Result Overview

Containers

Various containers were helpful during development. Maybe they can be useful for you, too:

  • PACS Container Stack (based on https://www.dcm4che.org)
  • Dataset Generators
  • Port Redirect
  • Postprocessing
  • Preprocessing
  • Visdom-Docker

Evaluation of the prototypical platform

The platform (as of July 2019) was evaluated by clinical data scientists and medical experts. For details consult the thesis published here: http://www.ub.uni-heidelberg.de/archiv/27446

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published