Pymm: Python Wrapper for MetaMap

Python Wrapper for extracting candidate and mapping concepts using MetaMap. Pymm parses the XML output of the MetaMap. The below concept information are extracted:

score
matched word
cui
semtypes
negated
matched word start position
matched word end position
ismapping

The flag ismapping is set to True if it is a mapping concept else it is False for a candidate mapping.

Installation

git clone https://github.com/smujjiga/pymm.git
cd pymm
python setup.py install

Usage

Create Python MetaMap wrapper object by pointing it to locaiton of MetaMap

from pymm import Metamap
mm = Metamap(METAMAP_PATH)

We can check if metamap is running using

assert mm.is_alive()

Concept extraction is done via parse method

mmos = mm.parse(['heart attack', 'myocardial infarction'])

Parse method returns an iterator of Metamap Object iterators corresponding to each input sentence. Each Metamap Object iterator return the candidate and mapping concepts.

for idx, mmo in enumerate(mmos):
   for jdx, concept in enumerate(mmo):
     print (concept.cui, concept.score, concept.matched)
     print (concept.semtypes, concept.ismapping)

Python MetaMap wrapper object also support debug parameter which persists input and output files as well print the command line used to run the MetaMap

mm = Metamap(METAMAP_PATH, debug=True)

Sample

Below shown is a code snippet for extracting concepts on large number of sentences.

def read_lines(file_name, fast_forward_to, batch_size, preprocessing):
    sentences = list()
    with open(file_name, 'r') as fp:
        for i in range(fast_forward_to):
            fp.readline()

        for idx, line in enumerate(fp):
            sentences.append(preprocessing(line))
            if (idx+1) % batch_size == 0:
                yield sentences
                sentences.clear()
try:
    for i, sentences in enumerate(read_lines(CLINICAL_TEXT_FILE, last_checkpoint, BATCH_SIZE, clean_text)):
        timeout = 0.33*BATCH_SIZE
        try_again = False
        try:
            mmos = mm.parse(sentences, timeout=timeout)
        except MetamapStuck:
            # Try with larger timeout
            print ("Metamap Stuck !!!; trying with larger timeout")
            try_again = True
        except:
            print ("Exception in mm; skipping the batch")
            traceback.print_exc(file=sys.stdout)
            continue

        if try_again:
            timeout = BATCH_SIZE*2
            try:
                mmos = mm.parse(sentences, timeout=timeout)
            except MetamapStuck:
                # Again stuck; Ignore this batch
                print ("Metamap Stuck again !!!; ignoring the batch")
                continue
            except:
                print ("Exception in mm; skipping the batch")
                traceback.print_exc(file=sys.stdout)
                continue

        for idx, mmo in enumerate(mmos):
            for jdx, concept in enumerate(mmo):
                save(sentences[idx], concept)

        curr_checkpoint = (i+1)*BATCH_SIZE + last_checkpoint
        record_checkpoint(curr_checkpoint)
finally:
    mm.close()

Acknowledgement

This python wrapper is motivated by https://github.com/AnthonyMRios/pymetamap. Pymetamap parses the MMI output where as Pymm parses XML output. I decided to code Pymm targeting extraction of concept on huge corpus. I have used Pymm to extract candidate and mapping concepts of 10 Million sentence.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
docs		docs
src/pymm		src/pymm
tests		tests
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

docs

docs

src/pymm

src/pymm

tests

tests

AUTHORS.rst

AUTHORS.rst

CHANGELOG.rst

CHANGELOG.rst

LICENSE.txt

LICENSE.txt

README.md

README.md

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

Pymm: Python Wrapper for MetaMap

Installation

Usage

Sample

Acknowledgement

About

Releases

Packages

Languages

License

smujjiga/pymm

Folders and files

Latest commit

History

Repository files navigation

Pymm: Python Wrapper for MetaMap

Installation

Usage

Sample

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages