Your browser doesn't support javascript.
loading
Automated extraction of Biomarker information from pathology reports.
Lee, Jeongeun; Song, Hyun-Je; Yoon, Eunsil; Park, Seong-Bae; Park, Sung-Hye; Seo, Jeong-Wook; Park, Peom; Choi, Jinwook.
Afiliação
  • Lee J; Interdisciplinary Program for Bioengineering, Graduate School, Seoul National Universty, Seoul, Republic of Korea.
  • Song HJ; School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea.
  • Yoon E; PAS1 team, TmaxSoft, Gyeonggi-do, Republic of Korea.
  • Park SB; School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea.
  • Park SH; Department of Pathology, College of Medicine, Seoul National University, Seoul, Republic of Korea.
  • Seo JW; Department of Pathology, College of Medicine, Seoul National University, Seoul, Republic of Korea.
  • Park P; Department of Industrial Engineering, Ajou University, Suwon, Republic of Korea.
  • Choi J; Interdisciplinary Program for Bioengineering, Graduate School, Seoul National Universty, Seoul, Republic of Korea. jinchoi@snu.ac.kr.
BMC Med Inform Decis Mak ; 18(1): 29, 2018 05 21.
Article em En | MEDLINE | ID: mdl-29783980
ABSTRACT

BACKGROUND:

Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports.

METHODS:

We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a "slide paragraph" unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital.

RESULTS:

High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search.

CONCLUSIONS:

Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Imuno-Histoquímica / Biomarcadores / Tomada de Decisão Clínica / Modelos Teóricos / Neoplasias Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Med Inform Decis Mak Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Imuno-Histoquímica / Biomarcadores / Tomada de Decisão Clínica / Modelos Teóricos / Neoplasias Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Med Inform Decis Mak Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2018 Tipo de documento: Article