Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Biom J ; 66(2): e2300063, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38519877

RESUMO

Variable selection is usually performed to increase interpretability, as sparser models are easier to understand than full models. However, a focus on sparsity is not always suitable, for example, when features are related due to contextual similarities or high correlations. Here, it may be more appropriate to identify groups and their predictive members, a task that can be accomplished with bi-level selection procedures. To investigate whether such techniques lead to increased interpretability, group exponential LASSO (GEL), sparse group LASSO (SGL), composite minimax concave penalty (cMCP), and least absolute shrinkage, and selection operator (LASSO) as reference methods were used to select predictors in time-to-event, regression, and classification tasks in bootstrap samples from a cohort of 1001 patients. Different groupings based on prior knowledge, correlation structure, and random assignment were compared in terms of selection relevance, group consistency, and collinearity tolerance. The results show that bi-level selection methods are superior to LASSO in all criteria. The cMCP demonstrated superiority in selection relevance, while SGL was convincing in group consistency. An all-round capacity was achieved by GEL: the approach jointly selected correlated and content-related predictors while maintaining high selection relevance. This method seems recommendable when variables are grouped, and interpretation is of primary interest.

2.
Biom J ; 66(4): e2200334, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38747086

RESUMO

Many data sets exhibit a natural group structure due to contextual similarities or high correlations of variables, such as lipid markers that are interrelated based on biochemical principles. Knowledge of such groupings can be used through bi-level selection methods to identify relevant feature groups and highlight their predictive members. One of the best known approaches of this kind combines the classical Least Absolute Shrinkage and Selection Operator (LASSO) with the Group LASSO, resulting in the Sparse Group LASSO. We propose the Sparse Group Penalty (SGP) framework, which allows for a flexible combination of different SGL-style shrinkage conditions. Analogous to SGL, we investigated the combination of the Smoothly Clipped Absolute Deviation (SCAD), the Minimax Concave Penalty (MCP) and the Exponential Penalty (EP) with their group versions, resulting in the Sparse Group SCAD, the Sparse Group MCP, and the novel Sparse Group EP (SGE). Those shrinkage operators provide refined control of the effect of group formation on the selection process through a tuning parameter. In simulation studies, SGPs were compared with other bi-level selection methods (Group Bridge, composite MCP, and Group Exponential LASSO) for variable and group selection evaluated with the Matthews correlation coefficient. We demonstrated the advantages of the new SGE in identifying parsimonious models, but also identified scenarios that highlight the limitations of the approach. The performance of the techniques was further investigated in a real-world use case for the selection of regulated lipids in a randomized clinical trial.


Assuntos
Biometria , Biometria/métodos , Humanos
3.
Biometrics ; 79(4): 3082-3095, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37211860

RESUMO

Group variable selection is often required in many areas, and for this many methods have been developed under various situations. Unlike the individual variable selection, the group variable selection can select the variables in groups, and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method. More specifically, a penalized sieve maximum likelihood variable selection and estimation procedure is proposed and the oracle property of the proposed method is established. Also, an extensive simulation study is performed and suggests that the proposed approach works well in practical situations. An application of the method to a set of real data is provided.


Assuntos
Modelos de Riscos Proporcionais , Funções Verossimilhança , Análise de Regressão , Simulação por Computador
4.
Stat Med ; 42(3): 331-352, 2023 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-36546512

RESUMO

This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.


Assuntos
Simulação por Computador , Humanos
5.
Stat Med ; 40(6): 1498-1518, 2021 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-33368447

RESUMO

An increasing number of genome-wide association studies (GWAS) summary statistics is made available to the scientific community. Exploiting these results from multiple phenotypes would permit identification of novel pleiotropic associations. In addition, incorporating prior biological information in GWAS such as group structure information (gene or pathway) has shown some success in classical GWAS approaches. However, this has not been widely explored in the context of pleiotropy. We propose a Bayesian meta-analysis approach (termed GCPBayes) that uses summary-level GWAS data across multiple phenotypes to detect pleiotropy at both group-level (gene or pathway) and within group (eg, at the SNP level). We consider both continuous and Dirac spike and slab priors for group selection. We also use a Bayesian sparse group selection approach with hierarchical spike and slab priors that enables us to select important variables both at the group level and within group. GCPBayes uses a Bayesian statistical framework based on Markov chain Monte Carlo (MCMC) Gibbs sampling. It can be applied to multiple types of phenotypes for studies with overlapping or nonoverlapping subjects, and takes into account heterogeneity in the effect size and allows for the opposite direction of the genetic effects across traits. Simulations show that the proposed methods outperform benchmark approaches such as ASSET and CPBayes in the ability to retrieve pleiotropic associations at both SNP and gene-levels. To illustrate the GCPBayes method, we investigate the shared genetic effects between thyroid cancer and breast cancer in candidate pathways.


Assuntos
Estudo de Associação Genômica Ampla , Neoplasias , Teorema de Bayes , Genômica , Estrutura de Grupo , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
6.
Stat Med ; 37(23): 3338-3356, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29888397

RESUMO

Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at https://github.com/matt-sutton/sgspls.


Assuntos
Análise dos Mínimos Quadrados , Modelos Estatísticos , Vacinas contra a AIDS/uso terapêutico , Algoritmos , Bioestatística , Ensaios Clínicos como Assunto/estatística & dados numéricos , Simulação por Computador , Genômica/métodos , Humanos , Funções Verossimilhança , Análise Multivariada , Análise de Regressão
7.
Comput Stat Data Anal ; 110: 115-133, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28943688

RESUMO

A Bayesian bi-level variable selection method (BAGB: Bayesian Analysis of Group Bridge) is developed for regularized regression and classification. This new development is motivated by grouped data, where generic variables can be divided into multiple groups, with variables in the same group being mechanistically related or statistically correlated. As an alternative to frequentist group variable selection methods, BAGB incorporates structural information among predictors through a group-wise shrinkage prior. Posterior computation proceeds via an efficient MCMC algorithm. In addition to the usual ease-of-interpretation of hierarchical linear models, the Bayesian formulation produces valid standard errors, a feature that is notably absent in the frequentist framework. Empirical evidence of the attractiveness of the method is illustrated by extensive Monte Carlo simulations and real data analysis. Finally, several extensions of this new approach are presented, providing a unified framework for bi-level variable selection in general models with flexible penalties.

8.
Lifetime Data Anal ; 23(3): 353-376, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-27016934

RESUMO

Penalized variable selection methods have been extensively studied for standard time-to-event data. Such methods cannot be directly applied when subjects are at risk of multiple mutually exclusive events, known as competing risks. The proportional subdistribution hazard (PSH) model proposed by Fine and Gray (J Am Stat Assoc 94:496-509, 1999) has become a popular semi-parametric model for time-to-event data with competing risks. It allows for direct assessment of covariate effects on the cumulative incidence function. In this paper, we propose a general penalized variable selection strategy that simultaneously handles variable selection and parameter estimation in the PSH model. We rigorously establish the asymptotic properties of the proposed penalized estimators and modify the coordinate descent algorithm for implementation. Simulation studies are conducted to demonstrate the good performance of the proposed method. Data from deceased donor kidney transplants from the United Network of Organ Sharing illustrate the utility of the proposed method.


Assuntos
Modelos de Riscos Proporcionais , Risco , Interpretação Estatística de Dados , Humanos , Incidência
9.
Biometrics ; 71(3): 731-40, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25773593

RESUMO

In many applications, covariates possess a grouping structure that can be incorporated into the analysis to select important groups as well as important members of those groups. One important example arises in genetic association studies, where genes may have several variants capable of contributing to disease. An ideal penalized regression approach would select variables by balancing both the direct evidence of a feature's importance as well as the indirect evidence offered by the grouping structure. This work proposes a new approach we call the group exponential lasso (GEL) which features a decay parameter controlling the degree to which feature selection is coupled together within groups. We demonstrate that the GEL has a number of statistical and computational advantages over previously proposed group penalties such as the group lasso, group bridge, and composite MCP. Finally, we apply these methods to the problem of detecting rare variants in a genetic association study.


Assuntos
Estudos de Associação Genética/métodos , Predisposição Genética para Doença/epidemiologia , Predisposição Genética para Doença/genética , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Análise de Regressão , Simulação por Computador , Interpretação Estatística de Dados , Projetos de Pesquisa Epidemiológica , Variação Genética/genética , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
Biometrics ; 71(1): 53-62, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25257196

RESUMO

In many scientific and engineering applications, covariates are naturally grouped. When the group structures are available among covariates, people are usually interested in identifying both important groups and important variables within the selected groups. Among existing successful group variable selection methods, some methods fail to conduct the within group selection. Some methods are able to conduct both group and within group selection, but the corresponding objective functions are non-convex. Such a non-convexity may require extra numerical effort. In this article, we propose a novel Log-Exp-Sum(LES) penalty for group variable selection. The LES penalty is strictly convex. It can identify important groups as well as select important variables within the group. We develop an efficient group-level coordinate descent algorithm to fit the model. We also derive non-asymptotic error bounds and asymptotic group selection consistency for our method in the high-dimensional setting where the number of covariates can be much larger than the sample size. Numerical results demonstrate the good performance of our method in both variable selection and prediction. We applied the proposed method to an American Cancer Society breast cancer survivor dataset. The findings are clinically meaningful and may help design intervention programs to improve the qualify of life for breast cancer survivors.


Assuntos
Neoplasias da Mama/epidemiologia , Neoplasias da Mama/terapia , Avaliação de Resultados em Cuidados de Saúde/métodos , Qualidade de Vida/psicologia , Sobreviventes/psicologia , Sobreviventes/estatística & dados numéricos , Adolescente , Adulto , Neoplasias da Mama/psicologia , Simulação por Computador , Feminino , Humanos , Pessoa de Meia-Idade , Modelos Estatísticos , Prevalência , Taxa de Sobrevida , Resultado do Tratamento , Estados Unidos/epidemiologia , Adulto Jovem
11.
Stat Med ; 32(28): 4938-53, 2013 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-23824835

RESUMO

We are interested in developing integrative approaches for variable selection problems that incorporate external knowledge on a set of predictors of interest. In particular, we have developed an integrative Bayesian model uncertainty (iBMU) method, which formally incorporates multiple sources of data via a second-stage probit model on the probability that any predictor is associated with the outcome of interest. Using simulations, we demonstrate that iBMU leads to an increase in power to detect true marginal associations over more commonly used variable selection techniques, such as least absolute shrinkage and selection operator and elastic net. In addition, iBMU leads to a more efficient model search algorithm over the basic BMU method even when the predictor-level covariates are only modestly informative. The increase in power and efficiency of our method becomes more substantial as the predictor-level covariates become more informative. Finally, we demonstrate the power and flexibility of iBMU for integrating both gene structure and functional biomarker information into a candidate gene study investigating over 50 genes in the brain reward system and their role with smoking cessation from the Pharmacogenetics of Nicotine Addiction and Treatment Consortium.


Assuntos
Algoritmos , Teorema de Bayes , Modelos Estatísticos , Incerteza , Simulação por Computador , Humanos , Nicotina/metabolismo , Polimorfismo de Nucleotídeo Único/genética , Receptores Nicotínicos/genética , Receptores Nicotínicos/metabolismo , Abandono do Hábito de Fumar/métodos
12.
J Appl Stat ; 50(2): 247-263, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36698544

RESUMO

In this paper, we propose a novel group variance inflation factor (VIF) regression model for tackling large data sets where data follows a grouped structure. Unlike classical penalized methods, this approach can perform group variable selection in a sparse model, which is quite different from the classical penalized methods. We further adapt the proposed method associated with a two-stage procedure for detecting multiple change-point in linear models. We carry out extensive simulation studies to show that the proposed group variable selection and change-point detection methods are stable and efficient. Finally, we provide two real data examples, including a body fat data set and an air pollution data set, to illustrate the performance of our algorithms in group selection and change-point detection.

13.
Ann Stat ; 40(4): 2043-2068, 2012 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-24850975

RESUMO

This paper is concerned with the selection and estimation of fixed and random effects in linear mixed effects models. We propose a class of nonconcave penalized profile likelihood methods for selecting and estimating important fixed effects. To overcome the difficulty of unknown covariance matrix of random effects, we propose to use a proxy matrix in the penalized profile likelihood. We establish conditions on the choice of the proxy matrix and show that the proposed procedure enjoys the model selection consistency where the number of fixed effects is allowed to grow exponentially with the sample size. We further propose a group variable selection strategy to simultaneously select and estimate important random effects, where the unknown covariance matrix of random effects is replaced with a proxy matrix. We prove that, with the proxy matrix appropriately chosen, the proposed procedure can identify all true random effects with asymptotic probability one, where the dimension of random effects vector is allowed to increase exponentially with the sample size. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. We further illustrate the proposed procedures via a real data example.

14.
Neural Netw ; 118: 220-234, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-31319320

RESUMO

The need to select groups of variables arises in many statistical modeling problems and applications. In this paper, we consider the ℓp,0-norm regularization for enforcing group sparsity and investigate a DC (Difference of Convex functions) approximation approach for solving the ℓp,0-norm regularization problem. We show that, with suitable parameters, the original and approximate problems are equivalent. Considering two equivalent formulations of the approximate problem we develop DC programming and DCA (DC Algorithm) for solving them. As an application, we implement the proposed algorithms for group variable selection in the optimal scoring problem. The sparsity is obtained by using the ℓp,0-regularization that selects the same features in all discriminant vectors. The resulting sparse discriminant vectors provide a more interpretable low-dimensional representation of data. The experimental results on both simulated datasets and real datasets indicate the efficiency of the proposed algorithms.


Assuntos
Algoritmos , Bases de Dados Factuais/normas
15.
Stat Methods Med Res ; 26(5): 2319-2332, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26265764

RESUMO

Current assessment of gene-gene interactions is typically based on separate parallel analysis, where each interaction term is tested separately, while less attention has been paid on simultaneous estimation of interaction terms in a prediction model. As the number of interaction terms grows fast, sparse estimation is desirable from statistical and interpretability reasons. There is a large literature on sparse estimation, but there is a natural hierarchy between the interaction and its corresponding main effects that requires special considerations. We describe random-effect models that impose sparse estimation of interactions under both strong and weak-hierarchy constraints. We develop an estimation procedure based on the hierarchical-likelihood argument and show that the modelling approach is equivalent to a penalty-based method, with the advantage of the models being more transparent and flexible. We compare the procedure with some standard methods in a simulation study and illustrate its application in an analysis of gene-gene interaction model to predict body-mass index.


Assuntos
Epistasia Genética , Modelos Estatísticos , Algoritmos , Índice de Massa Corporal , Predisposição Genética para Doença/genética , Humanos , Funções Verossimilhança
16.
Ann Appl Stat ; 9(2): 640-664, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26478762

RESUMO

Although genome-wide association studies (GWAS) have proven powerful for comprehending the genetic architecture of complex traits, they are challenged by a high dimension of single-nucleotide polymorphisms (SNPs) as predictors, the presence of complex environmental factors, and longitudinal or functional natures of many complex traits or diseases. To address these challenges, we propose a high-dimensional varying-coefficient model for incorporating functional aspects of phenotypic traits into GWAS to formulate a so-called functional GWAS or fGWAS. Bayesian group lasso and the associated MCMC algorithms are developed to identify significant SNPs and estimate how they affect longitudinal traits through time-varying genetic actions. The model is generalized to analyze the genetic control of complex traits using subject-specific sparse longitudinal data. The statistical properties of the new model are investigated through simulation studies. We use the new model to analyze a real GWAS data set from the Framingham Heart Study, leading to the identification of several significant SNPs associated with age-specific changes of body mass index. The fGWAS model, equipped with Bayesian group lassso, will provide a useful tool for genetic and developmental analysis of complex traits or diseases.

17.
Stat Biosci ; 4(1): 27-46, 2012 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-23795219

RESUMO

Penalized regression incorporating prior dependency structure of predictors can be effective in high-dimensional data analysis (Li and Li 2008). Pan, Xie and Shen (2010) proposed a penalized regression method for better outcome prediction and variable selection by smoothing parameters over a given predictor network, which can be applied to analysis of microarray data with a given gene network. In this paper, we develop two modifications to their method for further performance enhancement. First, we employ convex programming and show its improved performance over an approximate optimization algorithm implemented in their original proposal. Second, we perform bias reduction after initial variable selection through a new penalty, leading to better parameter estimates and outcome prediction. Simulations have demonstrated substantial performance improvement of the proposed modifications over the original method.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA