Warning: this repository was previously prepared for an academic assessment!
This repository was made public after the end of my academic assessments to provide a personal showcase for my past work, with the understanding that the topic and nature of the assessment would change every year. In the unlikely event its contents becomes relevant to any current academic assessment, they should not be used under any circumstances for such an academic purpose.
This warning was added after a recent event at the time of writing, despite the content of this repository not being involved in any way.
In this coursework project, data taken from the BioCreative V Chemical-Disease Relation dataset were used to train a Conditional Random Field (CRM) entity recognition model, which was then paired with an approximate string matching-based grounding system to extract relations between mentions of chemicals and diseases in biomedical literature.
The codebase operates on Python 2.7, and was based on the framework supplied by the assessment setters. The hard-forked repository can be found here.
In the unlikely case that you wish to make use of this repository, with the obvious warning of academic prohibitions against using the codebase for assessments, I disclaim copyright to my proportion of the codebase. The original conll2crfsuite and crfutils tools belong to their original setters, who may have stricter constraints on how derivations of their work can be used.