Your browser doesn't support javascript.
loading
Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research.
Song, Gyuseon; Chung, Su Jin; Seo, Ji Yeon; Yang, Sun Young; Jin, Eun Hyo; Chung, Goh Eun; Shim, Sung Ryul; Sa, Soonok; Hong, Moongi Simon; Kim, Kang Hyun; Jang, Eunchan; Lee, Chae Won; Bae, Jung Ho; Han, Hyun Wook.
Afiliação
  • Song G; Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Chung SJ; Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Seo JY; Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea.
  • Yang SY; Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea.
  • Jin EH; Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea.
  • Chung GE; Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea.
  • Shim SR; Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea.
  • Sa S; Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Hong MS; Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Kim KH; Department of Health and Medical Informatics, Kyungnam University College of Health Sciences, Changwon 51767, Korea.
  • Jang E; Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Lee CW; Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Bae JH; Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
  • Han HW; Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea.
J Clin Med ; 11(11)2022 May 24.
Article em En | MEDLINE | ID: mdl-35683353
ABSTRACT
Background and

Aims:

The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research.

Methods:

An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010-2019 to identify patient demographics and clinical information for 10 gastric diseases.

Results:

For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively.

Conclusions:

We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article