selectBoost: a general algorithm to enhance the performance of variable selection methods.

Bertrand, Frédéric; Aouadi, Ismaïl; Jung, Nicolas; Carapito, Raphael; Vallat, Laurent; Bahram, Seiamak; Maumy-Bertrand, Myriam

Bertrand, Frédéric; Aouadi, Ismaïl; Jung, Nicolas; Carapito, Raphael; Vallat, Laurent; Bahram, Seiamak; Maumy-Bertrand, Myriam.

Afiliação

Bertrand F; Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France.
Aouadi I; Université de Technologie de Troyes, ICD, ROSAS, M2S, Troyes, France.
Jung N; ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.
Carapito R; Laboratoire International Associé (LIA) INSERM, Strasbourg (France) - Nagano (Japan), Strasbourg, France.
Vallat L; Fédération Hospitalo-Universitaire (FHU) OMICARE, Laboratoire Central d'Immunologie, Pôle de Biologie, Nouvel Hôpital Civil, Hôpitaux Universitaires de Strasbourg, Strasbourg, France.
Bahram S; Institut de Recherche Mathématique Avancée, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France.
Maumy-Bertrand M; ImmunoRhumatologie Moléculaire, INSERM UMR_S 1109, LabEx TRANSPLANTEX, Centre de Recherche d'Immunologie et d'Hématologie, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France.

Bioinformatics ; 37(5): 659-668, 2021 05 05.

Article em En | MEDLINE | ID: mdl-33016991

RESUMO

MOTIVATION: With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting. RESULTS: In this article, we propose a general algorithm, which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or be used in an experimental design planning perspective. We demonstrate the performance of our algorithm on both simulated and real data. We then apply it in two different ways to improve biological network reverse-engineering. AVAILABILITY AND IMPLEMENTATION: Code is available as the SelectBoost package on the CRAN, https://cran.r-project.org/package=SelectBoost. Some network reverse-engineering functionalities are available in the Patterns CRAN package, https://cran.r-project.org/package=Patterns. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos; Software; Big Data; Projetos de Pesquisa

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Software Tipo de estudo: Prognostic_studies Idioma: En Revista: Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: França

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google