Your browser doesn't support javascript.
loading
Synthetically enhanced: unveiling synthetic data's potential in medical imaging research.
Khosravi, Bardia; Li, Frank; Dapamede, Theo; Rouzrokh, Pouria; Gamble, Cooper U; Trivedi, Hari M; Wyles, Cody C; Sellergren, Andrew B; Purkayastha, Saptarshi; Erickson, Bradley J; Gichoya, Judy W.
Afiliação
  • Khosravi B; Department of Radiology, Mayo Clinic, Rochester, MN, USA; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.
  • Li F; Department of Radiology, Emory University, Atlanta, GA, USA.
  • Dapamede T; Department of Radiology, Emory University, Atlanta, GA, USA.
  • Rouzrokh P; Department of Radiology, Mayo Clinic, Rochester, MN, USA; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.
  • Gamble CU; Department of Radiology, Mayo Clinic, Rochester, MN, USA.
  • Trivedi HM; Department of Radiology, Emory University, Atlanta, GA, USA.
  • Wyles CC; Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.
  • Sellergren AB; Google Health, Google, Palo Alto, CA, USA.
  • Purkayastha S; School of Informatics and Computing, Indiana University-Purdue University, Indianapolis, IN, USA.
  • Erickson BJ; Department of Radiology, Mayo Clinic, Rochester, MN, USA. Electronic address: bje@mayo.edu.
  • Gichoya JW; Department of Radiology, Emory University, Atlanta, GA, USA. Electronic address: judywawira@emory.edu.
EBioMedicine ; 104: 105174, 2024 Jun.
Article em En | MEDLINE | ID: mdl-38821021
ABSTRACT

BACKGROUND:

Chest X-rays (CXR) are essential for diagnosing a variety of conditions, but when used on new populations, model generalizability issues limit their efficacy. Generative AI, particularly denoising diffusion probabilistic models (DDPMs), offers a promising approach to generating synthetic images, enhancing dataset diversity. This study investigates the impact of synthetic data supplementation on the performance and generalizability of medical imaging research.

METHODS:

The study employed DDPMs to create synthetic CXRs conditioned on demographic and pathological characteristics from the CheXpert dataset. These synthetic images were used to supplement training datasets for pathology classifiers, with the aim of improving their performance. The evaluation involved three datasets (CheXpert, MIMIC-CXR, and Emory Chest X-ray) and various experiments, including supplementing real data with synthetic data, training with purely synthetic data, and mixing synthetic data with external datasets. Performance was assessed using the area under the receiver operating curve (AUROC).

FINDINGS:

Adding synthetic data to real datasets resulted in a notable increase in AUROC values (up to 0.02 in internal and external test sets with 1000% supplementation, p-value <0.01 in all instances). When classifiers were trained exclusively on synthetic data, they achieved performance levels comparable to those trained on real data with 200%-300% data supplementation. The combination of real and synthetic data from different sources demonstrated enhanced model generalizability, increasing model AUROC from 0.76 to 0.80 on the internal test set (p-value <0.01).

INTERPRETATION:

Synthetic data supplementation significantly improves the performance and generalizability of pathology classifiers in medical imaging.

FUNDING:

Dr. Gichoya is a 2022 Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program and declares support from RSNA Health Disparities grant (#EIHD2204), Lacuna Fund (#67), Gordon and Betty Moore Foundation, NIH (NIBIB) MIDRC grant under contracts 75N92020C00008 and 75N92020C00021, and NHLBI Award Number R01HL167811.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Diagnóstico por Imagem / Curva ROC Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Diagnóstico por Imagem / Curva ROC Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article