RESUMO
We propose a model selection criterion for correlated survival data when the cluster size is informative to the outcome. This approach, called Resampling Cluster Survival Information Criterion (RCSIC), uses the Cox proportional hazards model that is weighted with the inverse of the cluster size. The RCSIC based on the within-cluster resampling idea takes into account the possible variability of the within-cluster subsampling and the possible informativeness of cluster sizes. The RCSIC allows for easy execution for the within-cluster resampling idea without a large number of resamples of the data. In contrast with the traditional model selection method in survival analysis, the RCSIC has an additional penalization for the within-cluster subsampling variability. Our simulations show the satisfactory results where the RCSIC provides a more robust power for variable selection in terms of clustered survival analysis, regardless of whether informative cluster size exists or not. Applying the RCSIC method to a periodontal disease studies, we identify the tooth loss in patients associated with the risk factors, Age, Filled Tooth, Molar, Crown, Decayed Tooth, and Smoking Status, respectively.
Assuntos
Análise por Conglomerados , Humanos , Modelos de Riscos Proporcionais , Análise de Sobrevida , Fatores de Risco , Simulação por ComputadorRESUMO
In many medical and social science studies, count responses with excess zeros are very common and often the primary outcome of interest. Such count responses are usually generated under some clustered correlation structures due to longitudinal observations of subjects. To model such longitudinal count data with excess zeros, the zero-inflated binomial (ZIB) models for bounded outcomes, and the zero-inflated negative binomial (ZINB) and zero-inflated poisson (ZIP) models for unbounded outcomes all are popular methods. To alleviate the effects of deviations from model assumptions, a semiparametric (or, distribution-free) weighted generalized estimating equations has been proposed to estimate model parameters when data are subject to missingness. In this article, we further explore important covariates for the response variable. Without assumptions on the data distribution, a model selection criterion based on the expected weighted quadratic loss is proposed to select an appropriate subset of covariates, especially when count responses have excess zeros and data are subject to nonmonotone missingness in both responses and covariates. To understand the selection effects of the percentages of excess zeros and missingness, we design various scenarios for covariate selection in the mean model via simulation studies and a real data example regarding the study of cardiovascular disease is also presented for illustration.
Assuntos
Doenças Cardiovasculares , Modelos Estatísticos , Simulação por Computador , Humanos , Distribuição de Poisson , Redução de PesoRESUMO
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly.
Assuntos
Análise por Conglomerados , Modelos Estatísticos , Análise de Regressão , Idoso , Fragilidade , Humanos , Estudos Longitudinais , Fatores de Risco , Tamanho da AmostraRESUMO
In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.
Assuntos
Estudos Longitudinais , Modelos Estatísticos , Análise por Conglomerados , Simulação por Computador , HumanosRESUMO
This work develops a joint model selection criterion for simultaneously selecting the marginal mean regression and the correlation/covariance structure in longitudinal data analysis where both the outcome and the covariate variables may be subject to general intermittent patterns of missingness under the missing at random mechanism. The new proposal, termed "joint longitudinal information criterion" (JLIC), is based on the expected quadratic error for assessing model adequacy, and the second-order weighted generalized estimating equation (WGEE) estimation for mean and covariance models. Simulation results reveal that JLIC outperforms existing methods performing model selection for the mean regression and the correlation structure in a two stage and hence separate manner. We apply the proposal to a longitudinal study to identify factors associated with life satisfaction in the elderly of Taiwan.
Assuntos
Biometria/métodos , Humanos , Estudos Longitudinais , Modelos Teóricos , Análise Multivariada , Análise de RegressãoRESUMO
Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.
Assuntos
Interpretação Estatística de Dados , Modelos Estatísticos , Análise de Regressão , Projetos de Pesquisa , Envelhecimento , Humanos , Estudos LongitudinaisRESUMO
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI-GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the "quasi-likelihood under the independence model criterion" (QIC) and the "missing longitudinal information criterion" (MLIC), to accommodate multiple imputed datasets for selection of the MI-GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI-GEE analysis; (ii) the MI-based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.
Assuntos
Biometria/métodos , Modelos Estatísticos , Síndrome da Imunodeficiência Adquirida/tratamento farmacológico , Bases de Dados Factuais , Feminino , Humanos , Estudos Longitudinais , Masculino , Esquizofrenia/tratamento farmacológico , Fatores de TempoRESUMO
The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.
Assuntos
Algoritmos , Biometria/métodos , Interpretação Estatística de Dados , Métodos Epidemiológicos , Estudos Longitudinais , Modelos Estatísticos , Análise de Regressão , Tamanho da AmostraRESUMO
The aim of this article is to provide asymptotically valid likelihood inferences about regression parameters for correlated ordinal response variables. The legitimacy of this novel approach requires no knowledge of the underlying joint distributions so long as their second moments exist. The efficacy of the proposed parametric approach is demonstrated via simulations and the analyses of two real data sets.