Your browser doesn't support javascript.
loading
Obtaining Knowledge in Pathology Reports Through a Natural Language Processing Approach With Classification, Named-Entity Recognition, and Relation-Extraction Heuristics.
Oliwa, Tomasz; Maron, Steven B; Chase, Leah M; Lomnicki, Samantha; Catenacci, Daniel V T; Furner, Brian; Volchenboum, Samuel L.
Afiliação
  • Oliwa T; The University of Chicago, Chicago, IL.
  • Maron SB; Memorial Sloan Kettering Cancer Center, New York, NY.
  • Chase LM; The University of Chicago Medical Center, Chicago, IL.
  • Lomnicki S; The University of Chicago Medical Center, Chicago, IL.
  • Catenacci DVT; The University of Chicago Medical Center, Chicago, IL.
  • Furner B; The University of Chicago, Chicago, IL.
  • Volchenboum SL; The University of Chicago, Chicago, IL.
JCO Clin Cancer Inform ; 3: 1-8, 2019 08.
Article em En | MEDLINE | ID: mdl-31365274
PURPOSE: Robust institutional tumor banks depend on continuous sample curation or else subsequent biopsy or resection specimens are overlooked after initial enrollment. Curation automation is hindered by semistructured free-text clinical pathology notes, which complicate data abstraction. Our motivation is to develop a natural language processing method that dynamically identifies existing pathology specimen elements necessary for locating specimens for future use in a manner that can be re-implemented by other institutions. PATIENTS AND METHODS: Pathology reports from patients with gastroesophageal cancer enrolled in The University of Chicago GI oncology tumor bank were used to train and validate a novel composite natural language processing-based pipeline with a supervised machine learning classification step to separate notes into internal (primary review) and external (consultation) reports; a named-entity recognition step to obtain label (accession number), location, date, and sublabels (block identifiers); and a results proofreading step. RESULTS: We analyzed 188 pathology reports, including 82 internal reports and 106 external consult reports, and successfully extracted named entities grouped as sample information (label, date, location). Our approach identified up to 24 additional unique samples in external consult notes that could have been overlooked. Our classification model obtained 100% accuracy on the basis of 10-fold cross-validation. Precision, recall, and F1 for class-specific named-entity recognition models show strong performance. CONCLUSION: Through a combination of natural language processing and machine learning, we devised a re-implementable and automated approach that can accurately extract specimen attributes from semistructured pathology notes to dynamically populate a tumor registry.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Informática Médica / Processamento de Linguagem Natural / Software / Registros Eletrônicos de Saúde / Patologia Molecular / Relatório de Pesquisa / Heurística Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Informática Médica / Processamento de Linguagem Natural / Software / Registros Eletrônicos de Saúde / Patologia Molecular / Relatório de Pesquisa / Heurística Idioma: En Ano de publicação: 2019 Tipo de documento: Article