Skip to content

Codebase and report produced for the biomedical literature text mining coursework option of the Biomedical Information Process module (R214). Part of a collection of my taught component work towards the MPhil degree at the Computer Laboratory of the University of Cambridge.

chongyangshi/R214

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Warning: this repository was previously prepared for an academic assessment!

This repository was made public after the end of my academic assessments to provide a personal showcase for my past work, with the understanding that the topic and nature of the assessment would change every year. In the unlikely event its contents becomes relevant to any current academic assessment, they should not be used under any circumstances for such an academic purpose.

This warning was added after a recent event at the time of writing, despite the content of this repository not being involved in any way.


In this coursework project, data taken from the BioCreative V Chemical-Disease Relation dataset were used to train a Conditional Random Field (CRM) entity recognition model, which was then paired with an approximate string matching-based grounding system to extract relations between mentions of chemicals and diseases in biomedical literature.

The codebase operates on Python 2.7, and was based on the framework supplied by the assessment setters. The hard-forked repository can be found here.

In the unlikely case that you wish to make use of this repository, with the obvious warning of academic prohibitions against using the codebase for assessments, I disclaim copyright to my proportion of the codebase. The original conll2crfsuite and crfutils tools belong to their original setters, who may have stricter constraints on how derivations of their work can be used.

About

Codebase and report produced for the biomedical literature text mining coursework option of the Biomedical Information Process module (R214). Part of a collection of my taught component work towards the MPhil degree at the Computer Laboratory of the University of Cambridge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published