Pesquisa | BVS Doenças Infecciosas e Parasitárias

Ridge regression in prediction problems: automatic choice of the ridge parameter.

Cule, Erika; De Iorio, Maria.

Genet Epidemiol ; 37(7): 704-14, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-23893343

RESUMO

To date, numerous genetic variants have been identified as associated with diverse phenotypic traits. However, identified associations generally explain only a small proportion of trait heritability and the predictive power of models incorporating only known-associated variants has been small. Multiple regression is a popular framework in which to consider the joint effect of many genetic variants simultaneously. Ordinary multiple regression is seldom appropriate in the context of genetic data, due to the high dimensionality of the data and the correlation structure among the predictors. There has been a resurgence of interest in the use of penalised regression techniques to circumvent these difficulties. In this paper, we focus on ridge regression, a penalised regression approach that has been shown to offer good performance in multivariate prediction problems. One challenge in the application of ridge regression is the choice of the ridge parameter that controls the amount of shrinkage of the regression coefficients. We present a method to determine the ridge parameter based on the data, with the aim of good performance in high-dimensional prediction problems. We establish a theoretical justification for our approach, and demonstrate its performance on simulated genetic data and on a real data example. Fitting a ridge regression model to hundreds of thousands to millions of genetic variants simultaneously presents computational challenges. We have developed an R package, ridge, which addresses these issues. Ridge implements the automatic choice of ridge parameter presented in this paper, and is freely available from CRAN.

Assuntos

Variação Genética/genética , Modelos Genéticos , Fenótipo , Algoritmos , Transtorno Bipolar/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , Curva ROC , Análise de Regressão , Software

Seven ages of the PhD.

Gosling, Raymond; Tickle, Cheryll; Running, Steve W; Tandong, Yao; Dinnyes, Andras; Osowole, A A; Cule, Erika.

Nature ; 472(7343): 283-6, 2011 Apr 21.

Artigo em Inglês | MEDLINE | ID: mdl-21512550

Assuntos

Educação de Pós-Graduação/história , Pesquisadores/história , Pesquisa/história , Blogging , História do Século XX , História do Século XXI , Internacionalidade , Política Pública , Pesquisa/educação , Pesquisadores/educação

Significance testing in ridge regression for genetic data.

Cule, Erika; Vineis, Paolo; De Iorio, Maria.

BMC Bioinformatics ; 12: 372, 2011 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-21929786

RESUMO

BACKGROUND: Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. RESULTS: We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. CONCLUSIONS: The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.

Assuntos

Doença/genética , Estudos de Associação Genética/métodos , Análise de Regressão , Estudos de Casos e Controles , Mapeamento Cromossômico , Marcadores Genéticos , Humanos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Estudos Prospectivos

ABC-SysBio--approximate Bayesian computation in Python with GPU support.

Liepe, Juliane; Barnes, Chris; Cule, Erika; Erguler, Kamil; Kirk, Paul; Toni, Tina; Stumpf, Michael P H.

Bioinformatics ; 26(14): 1797-9, 2010 Jul 15.

Artigo em Inglês | MEDLINE | ID: mdl-20591907

RESUMO

MOTIVATION: The growing field of systems biology has driven demand for flexible tools to model and simulate biological systems. Two established problems in the modeling of biological processes are model selection and the estimation of associated parameters. A number of statistical approaches, both frequentist and Bayesian, have been proposed to answer these questions. RESULTS: Here we present a Python package, ABC-SysBio, that implements parameter inference and model selection for dynamical systems in an approximate Bayesian computation (ABC) framework. ABC-SysBio combines three algorithms: ABC rejection sampler, ABC SMC for parameter inference and ABC SMC for model selection. It is designed to work with models written in Systems Biology Markup Language (SBML). Deterministic and stochastic models can be analyzed in ABC-SysBio. AVAILABILITY: http://abc-sysbio.sourceforge.net

Assuntos

Software , Biologia de Sistemas/métodos , Teorema de Bayes

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA