Enhancing thoracic disease detection using chest X-rays from PubMed Central Open Access.

Lin, Mingquan; Hou, Bojian; Mishra, Swati; Yao, Tianyuan; Huo, Yuankai; Yang, Qian; Wang, Fei; Shih, George; Peng, Yifan

Lin, Mingquan; Hou, Bojian; Mishra, Swati; Yao, Tianyuan; Huo, Yuankai; Yang, Qian; Wang, Fei; Shih, George; Peng, Yifan.

Afiliação

Lin M; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.
Hou B; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, USA.
Mishra S; Department of Information Science, Cornell University, New York, USA.
Yao T; Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
Huo Y; Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
Yang Q; Department of Information Science, Cornell University, New York, USA.
Wang F; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.
Shih G; Department of Radiology, Weill Cornell Medicine, New York, USA.
Peng Y; Department of Population Health Sciences, Weill Cornell Medicine, New York, USA. Electronic address: yip4002@med.cornell.edu.

Comput Biol Med ; 159: 106962, 2023 06.

Article em En | MEDLINE | ID: mdl-37094464

ABSTRACT

ABSTRACT

Large chest X-rays (CXR) datasets have been collected to train deep learning models to detect thorax pathology on CXR. However, most CXR datasets are from single-center studies and the collected pathologies are often imbalanced. The aim of this study was to automatically construct a public, weakly-labeled CXR database from articles in PubMed Central Open Access (PMC-OA) and to assess model performance on CXR pathology classification by using this database as additional training data. Our framework includes text extraction, CXR pathology verification, subfigure separation, and image modality classification. We have extensively validated the utility of the automatically generated image database on thoracic disease detection tasks, including Hernia, Lung Lesion, Pneumonia, and pneumothorax. We pick these diseases due to their historically poor performance in existing datasets the NIH-CXR dataset (112,120 CXR) and the MIMIC-CXR dataset (243,324 CXR). We find that classifiers fine-tuned with additional PMC-CXR extracted by the proposed framework consistently and significantly achieved better performance than those without (e.g., Hernia 0.9335 vs 0.9154; Lung Lesion 0.7394 vs. 0.7207; Pneumonia 0.7074 vs. 0.6709; Pneumothorax 0.8185 vs. 0.7517, all in AUC with p< 0.0001) for CXR pathology detection. In contrast to previous approaches that manually submit the medical images to the repository, our framework can automatically collect figures and their accompanied figure legends. Compared to previous studies, the proposed framework improved subfigure segmentation and incorporates our advanced self-developed NLP technique for CXR pathology verification. We hope it complements existing resources and improves our ability to make biomedical image data findable, accessible, interoperable, and reusable.

Assuntos

Pneumonia; Pneumotórax; Doenças Torácicas; Humanos; Pneumotórax/diagnóstico por imagem; Radiografia Torácica/métodos; Raios X; Acesso à Informação; Pneumonia/diagnóstico por imagem

Palavras-chave

Artificial intelligence; Chest X-ray; PubMed

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Pneumonia / Pneumotórax / Doenças Torácicas Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google