The use of vector bootstrapping to improve variable selection precision in Lasso models.
Stat Appl Genet Mol Biol
; 15(4): 305-20, 2016 08 01.
Article
en En
| MEDLINE
| ID: mdl-27248122
ABSTRACT
The Lasso is a shrinkage regression method that is widely used for variable selection in statistical genetics. Commonly, K-fold cross-validation is used to fit a Lasso model. This is sometimes followed by using bootstrap confidence intervals to improve precision in the resulting variable selections. Nesting cross-validation within bootstrapping could provide further improvements in precision, but this has not been investigated systematically. We performed simulation studies of Lasso variable selection precision (VSP) with and without nesting cross-validation within bootstrapping. Data were simulated to represent genomic data under a polygenic model as well as under a model with effect sizes representative of typical GWAS results. We compared these approaches to each other as well as to software defaults for the Lasso. Nested cross-validation had the most precise variable selection at small effect sizes. At larger effect sizes, there was no advantage to nesting. We illustrated the nested approach with empirical data comprising SNPs and SNP-SNP interactions from the most significant SNPs in a GWAS of borderline personality symptoms. In the empirical example, we found that the default Lasso selected low-reliability SNPs and interactions which were excluded by bootstrapping.
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Asunto principal:
Análisis de Regresión
/
Estudio de Asociación del Genoma Completo
/
Modelos Genéticos
Tipo de estudio:
Diagnostic_studies
/
Prognostic_studies
Límite:
Humans
Idioma:
En
Revista:
Stat Appl Genet Mol Biol
Asunto de la revista:
BIOLOGIA MOLECULAR
/
GENETICA
Año:
2016
Tipo del documento:
Article