Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning.

Trivedi, Hari M; Panahiazar, Maryam; Liang, April; Lituiev, Dmytro; Chang, Peter; Sohn, Jae Ho; Chen, Yunn-Yi; Franc, Benjamin L; Joe, Bonnie; Hadley, Dexter

Trivedi, Hari M; Panahiazar, Maryam; Liang, April; Lituiev, Dmytro; Chang, Peter; Sohn, Jae Ho; Chen, Yunn-Yi; Franc, Benjamin L; Joe, Bonnie; Hadley, Dexter.

Afiliación

Trivedi HM; Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA. hari.trivedi@gmail.com.
Panahiazar M; Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.
Liang A; University of California School of Medicine, San Francisco, CA, USA.
Lituiev D; Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.
Chang P; Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Sohn JH; Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Chen YY; Department of Pathology, University of California, San Francisco, CA, USA.
Franc BL; Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Joe B; Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA.
Hadley D; Institute for Computational Health Sciences, University of California, San Francisco, CA, USA.

J Digit Imaging ; 32(1): 30-37, 2019 02.

Article en En | MEDLINE | ID: mdl-30128778

ABSTRACT

ABSTRACT

Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four

outcomes:

left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.

Asunto(s)

Neoplasias de la Mama/diagnóstico por imagen; Aprendizaje Profundo/estadística & datos numéricos; Registros Electrónicos de Salud/estadística & datos numéricos; Interpretación de Imagen Asistida por Computador/métodos; Mamografía/métodos; Mama/diagnóstico por imagen; Bases de Datos Factuales; Femenino; Humanos; Persona de Mediana Edad

Palabras clave

Artificial intelligence; Deep learning; IBM Watson; Machine learning; Mammography; Natural language processing (NLP); Pathology

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Neoplasias de la Mama / Mamografía / Interpretación de Imagen Asistida por Computador / Registros Electrónicos de Salud / Aprendizaje Profundo Límite: Female / Humans / Middle aged Idioma: En Revista: J Digit Imaging Asunto de la revista: DIAGNOSTICO POR IMAGEM / INFORMATICA MEDICA / RADIOLOGIA Año: 2019 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google