Robust evaluation of deep learning-based representation methods for survival and gene essentiality prediction on bulk RNA-seq data.

Gross, Baptiste; Dauvin, Antonin; Cabeli, Vincent; Kmetzsch, Virgilio; El Khoury, Jean; Dissez, Gaëtan; Ouardini, Khalil; Grouard, Simon; Davi, Alec; Loeb, Regis; Esposito, Christian; Hulot, Louis; Ghermi, Ridouane; Blum, Michael; Darhi, Yannis; Durand, Eric Y; Romagnoni, Alberto

Gross, Baptiste; Dauvin, Antonin; Cabeli, Vincent; Kmetzsch, Virgilio; El Khoury, Jean; Dissez, Gaëtan; Ouardini, Khalil; Grouard, Simon; Davi, Alec; Loeb, Regis; Esposito, Christian; Hulot, Louis; Ghermi, Ridouane; Blum, Michael; Darhi, Yannis; Durand, Eric Y; Romagnoni, Alberto.

Afiliación

Gross B; Owkin, Inc., New York, NY, USA. baptiste.gross@owkin.com.
Dauvin A; Owkin, Inc., New York, NY, USA.
Cabeli V; Owkin, Inc., New York, NY, USA.
Kmetzsch V; Owkin, Inc., New York, NY, USA.
El Khoury J; Owkin, Inc., New York, NY, USA.
Dissez G; Owkin, Inc., New York, NY, USA.
Ouardini K; Owkin, Inc., New York, NY, USA.
Grouard S; Owkin, Inc., New York, NY, USA.
Davi A; Owkin, Inc., New York, NY, USA.
Loeb R; Owkin, Inc., New York, NY, USA.
Esposito C; Owkin, Inc., New York, NY, USA.
Hulot L; Owkin, Inc., New York, NY, USA.
Ghermi R; Owkin, Inc., New York, NY, USA.
Blum M; Owkin, Inc., New York, NY, USA.
Darhi Y; Owkin, Inc., New York, NY, USA.
Durand EY; Owkin, Inc., New York, NY, USA.
Romagnoni A; Owkin, Inc., New York, NY, USA.

Sci Rep ; 14(1): 17064, 2024 07 24.

Article en En | MEDLINE | ID: mdl-39048590

ABSTRACT

ABSTRACT

Deep learning (DL) has shown potential to provide powerful representations of bulk RNA-seq data in cancer research. However, there is no consensus regarding the impact of design choices of DL approaches on the performance of the learned representation, including the model architecture, the training methodology and the various hyperparameters. To address this problem, we evaluate the performance of various design choices of DL representation learning methods using TCGA and DepMap pan-cancer datasets and assess their predictive power for survival and gene essentiality predictions. We demonstrate that baseline methods achieve comparable or superior performance compared to more complex models on survival predictions tasks. DL representation methods, however, are the most efficient to predict the gene essentiality of cell lines. We show that auto-encoders (AE) are consistently improved by techniques such as masking and multi-head training. Our results suggest that the impact of DL representations and of pretraining are highly task- and architecture-dependent, highlighting the need for adopting rigorous evaluation guidelines. These guidelines for robust evaluation are implemented in a pipeline made available to the research community.

Asunto(s)

Aprendizaje Profundo; Genes Esenciales; RNA-Seq; Humanos; RNA-Seq/métodos; Neoplasias/genética; Neoplasias/mortalidad; Biología Computacional/métodos

Palabras clave

Benchmarking; Deep learning; Gene essentiality; RNAseq; Representation learning; Survival prediction

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Genes Esenciales / Aprendizaje Profundo / RNA-Seq Límite: Humans Idioma: En Revista: Sci Rep Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google