Skip to content

tutubalinaev/relation-extraction-ehr

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Relation Extraction from Electronic Health Records

This repository contains the source code for the paper Alimova I., Tutubalina E. Multiple features for clinical relation extraction: A machine learning approach //Journal of Biomedical Informatics. – 2020. – Т. 103. – С. 103382.

@article{alimova2020multiple,
  title={Multiple features for clinical relation extraction: A machine learning approach},
  author={Alimova, Ilseyar and Tutubalina, Elena},
  journal={Journal of Biomedical Informatics},
  volume={103},
  pages={103382},
  year={2020},
  publisher={Elsevier}
}

Corpora

MADE corpus is taken from Jagannatha A. et al. Overview of the first natural language processing challenge for extracting medication, indication, and adverse drug events from electronic health record notes (MADE 1.0) //Drug safety. – 2019. – Т. 42. – №. 1. – С. 99-111.

n2c2 corpus is taken from Henry S. et al. 2018 n2c2 shared task on adverse drug events and medication extraction in electronic health records //Journal of the American Medical Informatics Association. – 2019.

Resources

Word2Vec models

PubMed+PMC+Wikipedia - Moen S., Ananiadou T. S. S. Distributional semantics resources for biomedical text processing //Proceedings of LBM. – 2013. – С. 39-44.

BioWordVec - Zhang Y. et al. BioWordVec, improving biomedical word embeddings with subword information and MeSH //Scientific data. – 2019. – Т. 6. – №. 1. – С. 52.

Concept embeddings - Beam A. L. et al. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data //arXiv preprint arXiv:1804.01486. – 2018.

Sent2VecModel

BioSentVec - Chen Q., Peng Y., Lu Z. BioSentVec: creating sentence embeddings for biomedical texts //2019 IEEE International Conference on Healthcare Informatics (ICHI). – IEEE, 2019. – С. 1-5.

UMLS semantic types

Unified Medical Language System - https://www.nlm.nih.gov/research/umls/

MeSH concept types

Medical Subject Headings - https://www.nlm.nih.gov/mesh/meshhome.html

Project Structure

features - directory with features implementation

models - directory with accessory models implementation

relation_classification.py - classifier implementation

utils.py - additional functions for resource loading

Classifier Usage

  1. install requirements
  2. download neccesary additional resources listed in Resources section
  3. download dataset
  4. add paths to relation_classification.py file
  5. run relation_classification.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%