Skip to content

SREDH-Consortium/OpenDeID-Corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

About OpenDeID Corpus

The OpenDeID corpus is the first Australian based gold-standard corpus for patient de-identification. This corpus can used for development of automated patient de-identification systems using rule based or machine learning approaches. The corpus comprises of 2,100 pathology reports consisting of approximately 717 tokens per report from 1,833 cancer patients with 38,414 PHI entities annotated. The overall inter-annotator agreement and deviation scores for all three settings were 0.9464 and 0.9503 respectively. The corpus is manually annotated with surrogate information and measures have been taken to make sure there is no indetifiable informaiton. For more information please refer to https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset

About OpenDeID pipeline

The OpenDeID corpus is used to design and develop OpenDeID pipeline. https://github.com/TCRNBioinformatics/OpenDeID-Pipeline

OpenDeID corpus access instructions and criteria

Please refer to https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset

Contact

contact: z3339253 (at) unsw (dot) edu (dot) au

Ethics approval

Please refer to https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset

FAQs

https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset/faqs

Related publications

https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset/faqs

Related links

https://github.com/SREDH-Consortium/OpenDeID-Corpus

https://github.com/SREDH-Consortium/OpenDeID-Pipeline

https://github.com/TCRNBioinformatics/OpenDeID-Corpus

https://github.com/TCRNBioinformatics/OpenDeID-Pipeline

https://www.sredhconsortium.org/sredh-datasets/opendeid-corpus-dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published