Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

Carrillo-Perez, Francisco; Pizurica, Marija; Zheng, Yuanning; Nandi, Tarak Nath; Madduri, Ravi; Shen, Jeanne; Gevaert, Olivier

Carrillo-Perez, Francisco; Pizurica, Marija; Zheng, Yuanning; Nandi, Tarak Nath; Madduri, Ravi; Shen, Jeanne; Gevaert, Olivier.

Afiliación

Carrillo-Perez F; Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.
Pizurica M; Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.
Zheng Y; Internet technology and Data science Lab (IDLab), Ghent University, Ghent, Belgium.
Nandi TN; Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.
Madduri R; Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA.
Shen J; Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA.
Gevaert O; Department of Pathology, Stanford University, School of Medicine, Palo Alto, CA, USA.

Nat Biomed Eng ; 2024 Mar 21.

Article en En | MEDLINE | ID: mdl-38514775

ABSTRACT

ABSTRACT

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: Nat Biomed Eng Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google