Temporal bone radiology report classification using open source machine learning and natural langue processing libraries.

Masino, Aaron J; Grundmeier, Robert W; Pennington, Jeffrey W; Germiller, John A; Crenshaw, E Bryan

Masino, Aaron J; Grundmeier, Robert W; Pennington, Jeffrey W; Germiller, John A; Crenshaw, E Bryan.

Afiliación

Masino AJ; Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3535 Market Street, Suite 1024, Philadelphia, PA, 19104, USA. masinoa@email.chop.edu.
Grundmeier RW; Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3535 Market Street, Suite 1024, Philadelphia, PA, 19104, USA.
Pennington JW; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, 34th Street & Civic Center Boulevard, Philadelphia, PA, 19104, USA.
Germiller JA; Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, 3535 Market Street, Suite 1024, Philadelphia, PA, 19104, USA.
Crenshaw EB; Center for Childhood Communication, The Children's Hospital of Philadelphia, 34th Street & Civic Center Boulevard, Philadelphia, PA, 19104, USA.

BMC Med Inform Decis Mak ; 16: 65, 2016 06 06.

Article en En | MEDLINE | ID: mdl-27267768

ABSTRACT

ABSTRACT

BACKGROUND:

Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region.

METHODS:

Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set.

RESULTS:

Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions.

CONCLUSIONS:

Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available.

Asunto(s)

Audiología/clasificación; Aprendizaje Automático; Procesamiento de Lenguaje Natural; Radiología/clasificación; Hueso Temporal/diagnóstico por imagen; Bases de Datos como Asunto; Humanos

Palabras clave

Audiology; Human-in-the-loop; Machine learning; Natural language processing; Radiology

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Radiología / Hueso Temporal / Procesamiento de Lenguaje Natural / Audiología / Aprendizaje Automático Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: BMC Med Inform Decis Mak Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google