Skip to content

NLM .nxml to text format conversion

License

Notifications You must be signed in to change notification settings

GullyBurns/nxml2txt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nxml2txt

NLM .nxml to text format conversion

Usage:

./nxml2txt NXMLFILE [TEXTFILE] [SOFILE]

For example (using test document):

./nxml2txt test/PMC3357053.nxml test/PMC3357053.txt test/PMC3357053.so

This creates the files test/PMC3357053.txt, containing the text content of the input document, and test/PMC3357053.so, containing the annotations (XML elements and their attributes) in a simple standoff format.

nxml2txt assumes a unix-like environment. If the input .nxml file contains embedded TeX-math, nxml2txt requires LaTeX and catdvi.

This tool was originally introduced as part of the BioNLP Shared Task 2011 supporting resources (https://github.com/ninjin/bionlp_st_2011_supporting).

About

NLM .nxml to text format conversion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.6%
  • Jupyter Notebook 12.5%
  • Shell 0.9%