Rank-based Bayesian variable selection for genome-wide transcriptomic analyses.

Eliseussen, Emilie; Fleischer, Thomas; Vitelli, Valeria

Eliseussen, Emilie; Fleischer, Thomas; Vitelli, Valeria.

Afiliação

Eliseussen E; Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.
Fleischer T; Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
Vitelli V; Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway.

Stat Med ; 41(23): 4532-4553, 2022 10 15.

Article em En | MEDLINE | ID: mdl-35844145

RESUMO

Variable selection is crucial in high-dimensional omics-based analyses, since it is biologically reasonable to assume only a subset of non-noisy features contributes to the data structures. However, the task is particularly hard in an unsupervised setting, and a priori ad hoc variable selection is still a very frequent approach, despite the evident drawbacks and lack of reproducibility. We propose a Bayesian variable selection approach for rank-based unsupervised transcriptomic analysis. Making use of data rankings instead of the actual continuous measurements increases the robustness of conclusions when compared to classical statistical methods, and embedding variable selection into the inferential tasks allows complete reproducibility. Specifically, we develop a novel extension of the Bayesian Mallows model for variable selection that allows for a full probabilistic analysis, leading to coherent quantification of uncertainties. Simulation studies demonstrate the versatility and robustness of the proposed method in a variety of scenarios, as well as its superiority with respect to several competitors when varying the data dimension or data generating process. We use the novel approach to analyze genome-wide RNAseq gene expression data from ovarian cancer patients: several genes that affect cancer development are correctly detected in a completely unsupervised fashion, showing the usefulness of the method in the context of signature discovery for cancer genomics. Moreover, the possibility to also perform uncertainty quantification plays a key role in the subsequent biological investigation.

Assuntos

Neoplasias; Transcriptoma; Teorema de Bayes; Genômica/métodos; Humanos; Reprodutibilidade dos Testes; Transcriptoma/genética

Palavras-chave

Bayesian inference; Mallows ranking model; high-dimensional data; unsupervised learning; variable selection

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Transcriptoma / Neoplasias Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google