Skip to content
/ arrc Public
forked from chop-dbhi/arrc

AudgenDB radiology report text classification and REST service

License

Notifications You must be signed in to change notification settings

GregTarr/arrc

 
 

Repository files navigation

arrc

AudgenDB radiology report text classification and REST service

Model Selection

The model_analysis.py file analyizes a number of classification models using the scikit-learn API and the NLTK API. It performs a grid search over several model hyper-parameters applying k-fold cross validation to select the best models. Performance is subsequently evaluated on a hold-out test set.

As written, the model_analysis.py file requries a label file that contains column headers pid, doc_norm, inner, middle, outer, and mastoid in that order. The pid column is a unique identifier for that corresponds to a text report file named pid.txt. The doc_norm column is binary valued and indicates if pid.txt is contains NO abnormalities (0) or at least one abnormality (1). The inner, middle, outer, and mastoid columns are also binary valued and indicate if pid.txt contains an no abnormality (0) or at least one abnormality (1) in the inner, middle, outer ear or mastoid regions respectively.

Training & Test Data

Training & test data can be obtained from the AudGenDB project via the AudGenDB application.

Model Persistence

To avoid re-training the classification models every time the REST service is started (see below) the selected classification models can be persisted via pickling. The model_persist.py will train and pickle the models using the hyper-parameters specified in the INSTALL_DIR/resources/config/models.ini file. See this sample file.

REST Service

A simple Flask REST service is availble as an example document labeling service. The REST service utilizes four scikit-learn based classifiers to classify radiology text reports relative to presence/absence of an abnormality in the inner, middel, outer, or mastoid ear regions.

Configuration

A configuration file is expected as INSTALL_DIR/resources/config/app_config.cfg where INSTALL_DIR is the directory containing this source code. The configuration file has four entries that are the paths to the model persistance files. See this sample file

Usage

Start the REST service:

python rs.py

The service has a single endpoint at http://HOST/classify where HOST is where the application is hosted. If running locally with Flask, HOST is http://localhost:5000 by default. This endpoint handles only POST requests. The body of the request must be JSON formated with entries

{"id1":"text1", "id2":"text2"}

The return result is also JSON formated in the form

{"id1":[v1,v2,v3,v4], "id2":[v1,v2,v3,v4]}

where v1-v4 are binary (0,1) values corresponding to the classifications for the inner, middle, outer, and mastoid regions, respectively.

About

AudgenDB radiology report text classification and REST service

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%