Sampling uncertainty versus method uncertainty: A general framework with applications to omics biomarker selection.

Klau, Simon; Martin-Magniette, Marie-Laure; Boulesteix, Anne-Laure; Hoffmann, Sabine

Klau, Simon; Martin-Magniette, Marie-Laure; Boulesteix, Anne-Laure; Hoffmann, Sabine.

Afiliação

Klau S; Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Munich, Germany.
Martin-Magniette ML; Institute of Plant Sciences Paris Saclay IPS2, CNRS, INRA, Université Paris-Sud, Université Evry, Université Paris-Saclay, Orsay, France.
Boulesteix AL; Institute of Plant Sciences Paris-Saclay IPS2, Paris Diderot, Sorbonne Paris-Cité, Orsay, France.
Hoffmann S; UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France.

Biom J ; 62(3): 670-687, 2020 05.

Article em En | MEDLINE | ID: mdl-31099917

RESUMO

Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.

Assuntos

Biometria/métodos; Biologia Computacional; Incerteza; Biomarcadores/metabolismo; Perfilação da Expressão Gênica; Humanos; Leucemia Mieloide Aguda/genética; Leucemia Mieloide Aguda/metabolismo

Palavras-chave

high-dimensional data; resampling; stability; variable ranking; variable selection

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Biometria / Biologia Computacional / Incerteza Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google