RESUMO
In health and clinical research, medical indices (eg, BMI) are commonly used for monitoring and/or predicting health outcomes of interest. While single-index modeling can be used to construct such indices, methods to use single-index models for analyzing longitudinal data with multiple correlated binary responses are underdeveloped, although there are abundant applications with such data (eg, prediction of multiple medical conditions based on longitudinally observed disease risk factors). This article aims to fill the gap by proposing a generalized single-index model that can incorporate multiple single indices and mixed effects for describing observed longitudinal data of multiple binary responses. Compared to the existing methods focusing on constructing marginal models for each response, the proposed method can make use of the correlation information in the observed data about different responses when estimating different single indices for predicting response variables. Estimation of the proposed model is achieved by using a local linear kernel smoothing procedure, together with methods designed specifically for estimating single-index models and traditional methods for estimating generalized linear mixed models. Numerical studies show that the proposed method is effective in various cases considered. It is also demonstrated using a dataset from the English Longitudinal Study of Aging project.
Assuntos
Modelos Estatísticos , Estudos Longitudinais , Humanos , Modelos Lineares , Simulação por Computador , Interpretação Estatística de DadosRESUMO
When there are resource constraints, it may be necessary to rank individualized treatment benefits to facilitate the prioritization of assigning different treatments. Most existing literature on individualized treatment rules targets absolute conditional treatment effect differences as a metric for the benefit. However, there can be settings where relative differences may better represent such benefit. In this paper, we consider modeling such relative differences formed as scale-invariant contrasts between the conditional treatment effects. By showing that all scale-invariant contrasts are monotonic transformations of each other, we posit a single index model for a particular relative contrast. We then characterize semiparametric estimating equations, including the efficient score, to estimate index parameters. To achieve semiparametric efficiency, we propose a two-step approach that minimizes a doubly robust loss function for initial estimation and then performs a one-step efficiency augmentation procedure. Careful theoretical and numerical studies are provided to show the superiority of our proposed approach.
Assuntos
Modelos Estatísticos , Medicina de Precisão , Medicina de Precisão/métodosRESUMO
In medical studies, composite indices and/or scores are routinely used for predicting medical conditions of patients. These indices are usually developed from observed data of certain disease risk factors, and it has been demonstrated in the literature that single index models can provide a powerful tool for this purpose. In practice, the observed data of disease risk factors are often longitudinal in the sense that they are collected at multiple time points for individual patients, and there are often multiple aspects of a patient's medical condition that are of our concern. However, most existing single-index models are developed for cases with independent data and a single response variable, which are inappropriate for the problem just described in which within-subject observations are usually correlated and there are multiple mutually correlated response variables involved. This paper aims to fill this methodological gap by developing a single index model for analyzing longitudinal data with multiple responses. Both theoretical and numerical justifications show that the proposed new method provides an effective solution to the related research problem. It is also demonstrated using a dataset from the English Longitudinal Study of Aging.
Assuntos
Estudos Longitudinais , Humanos , Estatística como AssuntoRESUMO
There has been a lot of interest in sufficient dimension reduction (SDR) methodologies, as well as nonlinear extensions in the statistics literature. The SDR methodology has previously been motivated by several considerations: (a) finding data-driven subspaces that capture the essential facets of regression relationships; (b) analyzing data in a 'model-free' manner. In this article, we develop an approach to interpreting SDR techniques using information theory. Such a framework leads to a more assumption-lean understanding of what SDR methods do and also allows for some connections to results in the information theory literature.
RESUMO
Panel current status data arise frequently in biomedical studies when the occurrence of a particular clinical condition is only examined at several prescheduled visit times. Existing methods for analyzing current status data have largely focused on regression modeling based on commonly used survival models such as the proportional hazards model and the accelerated failure time model. However, these procedures have the limitations of being difficult to implement and performing sub-optimally in relatively small sample sizes. The performance of these procedures is also unclear under model misspecification. In addition, no methods currently exist to evaluate the prediction performance of estimated risk models with panel current status data. In this paper, we propose a simple estimator under a general class of nonparametric transformation (NPT) models by fitting a logistic regression working model and demonstrate that our proposed estimator is consistent for the NPT model parameter up to a scale multiplier. Furthermore, we propose nonparametric estimators for evaluating the prediction performance of the risk score derived from model fitting, which is valid regardless of the adequacy of the fitted model. Extensive simulation results suggest that our proposed estimators perform well in finite samples and the regression parameter estimators outperform existing estimators under various scenarios. We illustrate the proposed procedures using data from the Framingham Offspring Study.
Assuntos
Modelos de Riscos Proporcionais , Simulação por Computador , Modelos Logísticos , Tamanho da AmostraRESUMO
As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as exp(n1/2) with respect to the number of clusters n. We consider a flexible semiparametric approach, namely, partially linear single-index models, for ultra high-dimensional longitudinal data. Most importantly, we allow not only the partially linear covariates but also the single-index covariates within the unknown flexible function estimated nonparametrically to be ultra high dimensional. Using penalized generalized estimating equations, this approach can capture correlation within subjects, can perform simultaneous variable selection and estimation with a smoothly clipped absolute deviation penalty, and can capture nonlinearity and potentially some interactions among predictors. We establish asymptotic theory for the estimators including the oracle property in ultra high dimension for both the partially linear and nonparametric components, and we present an efficient algorithm to handle the computational challenges. We show the effectiveness of our method and algorithm via a simulation study and a yeast cell cycle gene expression data.
Assuntos
Algoritmos , Análise de Dados , Biologia Computacional , Simulação por Computador , Humanos , Modelos LinearesRESUMO
We consider a single-index regression model, uniquely constrained to estimate interactions between a set of pretreatment covariates and a treatment variable on their effects on a response variable, in the context of analyzing data from randomized clinical trials. We represent interaction effect terms of the model through a set of treatment-specific flexible link functions on a linear combination of the covariates (a single index), subject to the constraint that the expected value given the covariates equals 0, while leaving the main effects of the covariates unspecified. We show that the proposed semiparametric estimator is consistent for the interaction term of the model, and that the efficiency of the estimator can be improved with an augmentation procedure. The proposed single-index regression provides a flexible and interpretable modeling approach to optimizing individualized treatment rules based on patients' data measured at baseline, as illustrated by simulation examples and an application to data from a depression clinical trial.
Assuntos
Simulação por Computador , HumanosRESUMO
The augmented inverse weighting method is one of the most popular methods for estimating the mean of the response in causal inference and missing data problems. An important component of this method is the propensity score. Popular parametric models for the propensity score include the logistic, probit, and complementary log-log models. A common feature of these models is that the propensity score is a monotonic function of a linear combination of the explanatory variables. To avoid the need to choose a model, we model the propensity score via a semiparametric single-index model, in which the score is an unknown monotonic nondecreasing function of the given single index. Under this new model, the augmented inverse weighting estimator (AIWE) of the mean of the response is asymptotically linear, semiparametrically efficient, and more robust than existing estimators. Moreover, we have made a surprising observation. The inverse probability weighting and AIWEs based on a correctly specified parametric model may have worse performance than their counterparts based on a nonparametric model. A heuristic explanation of this phenomenon is provided. A real-data example is used to illustrate the proposed methods.
Assuntos
Viés , Modelos Estatísticos , Pontuação de Propensão , Interpretação Estatística de Dados , Heurística , Projetos de PesquisaRESUMO
Inference for the state occupation probabilities, given a set of baseline covariates, is an important problem in survival analysis and time to event multistate data. We introduce an inverse censoring probability re-weighted semi-parametric single index model based approach to estimate conditional state occupation probabilities of a given individual in a multistate model under right-censoring. Besides obtaining a temporal regression function, we also test the potential time varying effect of a baseline covariate on future state occupation. We show that the proposed technique has desirable finite sample performances and its performance is competitive when compared with three other existing approaches. We illustrate the proposed methodology using two different data sets. First, we re-examine a well-known data set dealing with leukemia patients undergoing bone marrow transplant with various state transitions. Our second illustration is based on data from a study involving functional status of a set of spinal cord injured patients undergoing a rehabilitation program.
Assuntos
Probabilidade , Análise de Sobrevida , Transplante de Medula Óssea , Humanos , Leucemia/cirurgia , Cadeias de Markov , Modelos Estatísticos , Análise de Regressão , Traumatismos da Medula Espinal/reabilitação , Traumatismos da Medula Espinal/terapiaRESUMO
For portfolios with a large number of assets, the single index model allows for expressing the large number of covariances between individual asset returns through a significantly smaller number of parameters. This avoids the constraint of having very large samples to estimate the mean and the covariance matrix of the asset returns, which practically would be unrealistic given the dynamic of market conditions. The traditional way to estimate the regression parameters in the single index model is the maximum likelihood method. Although the maximum likelihood estimators have desirable theoretical properties when the model is exactly satisfied, they may give completely erroneous results when outliers are present in the data set. In this paper, we define minimum pseudodistance estimators for the parameters of the single index model and using them we construct new robust optimal portfolios. We prove theoretical properties of the estimators, such as consistency, asymptotic normality, equivariance, robustness, and illustrate the benefits of the new portfolio optimization method for real financial data.
RESUMO
Although single-index models have been extensively studied, the monotonicity of the link function f in the single-index model is rarely studied. In many situations, it is desirable that f is monotonic, which results in a monotonic single-index model that can be very useful in economics and biometrics. In this article, we propose a monotonic single-index model in which the link function is constructed using penalized I-splines along with constraints on coefficients to achieve monotonicity of the link function f. An algorithm to estimate the single-index parameters and the link function is developed, and the sandwich estimate of the variance of the index parameters is provided. We propose to apply this monotonic single-index model to estimate the dose-response surface and assess drug interactions while considering the variability of the observed data. An extensive simulation study was carried out to evaluate the performance of the proposed monotonic single-index model. A case study is provided to illustrate the application of the proposed model to estimate the dose-response surface and assess drug interactions. Both the simulation and case study show that the proposed monotonic single-index model works very well. Copyright © 2016 John Wiley & Sons, Ltd.
Assuntos
Interações Medicamentosas , Modelos Estatísticos , Algoritmos , Relação Dose-Resposta a Droga , Humanos , Estatística como AssuntoRESUMO
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
RESUMO
Gene-environment (G×E) interactions play key roles in many complex diseases. An increasing number of epidemiological studies have shown the combined effect of multiple environmental exposures on disease risk. However, no appropriate statistical models have been developed to conduct a rigorous assessment of such combined effects when G×E interactions are considered. In this paper, we propose a partial linear varying multi-index coefficient model (PLVMICM) to assess how multiple environmental factors act jointly to modify individual genetic risk on complex disease. Our model includes the varying-index coefficient model as a special case, where discrete variables are admitted as the linear part. Thus PLVMICM allows one to study nonlinear interaction effects between genes and continuous environments as well as linear interactions between genes and discrete environments, simultaneously. We derive a profile method to estimate parametric parameters and a B-spline backfitted kernel method to estimate nonlinear interaction functions. Consistency and asymptotic normality of the parametric and nonparametric estimates are established under some regularity conditions. Hypothesis testing for the parametric coefficients and nonparametric functions are conducted. Results show that the statistics for testing the parametric coefficients and the non-parametric functions asymptotically follow a χ2-distribution with different degrees of freedom. The utility of the method is demonstrated through extensive simulations and a case study.
RESUMO
Count data often arise in biomedical studies, while there could be a special feature with excessive zeros in the observed counts. The zero-inflated Poisson model provides a natural approach to accounting for the excessive zero counts. In the semiparametric framework, we propose a generalized partially linear single-index model for the mean of the Poisson component, the probability of zero, or both. We develop the estimation and inference procedure via a profile maximum likelihood method. Under some mild conditions, we establish the asymptotic properties of the profile likelihood estimators. The finite sample performance of the proposed method is demonstrated by simulation studies, and the new model is illustrated with a medical care dataset.
Assuntos
Modelos Lineares , Bioestatística/métodos , Simulação por Computador , Interpretação Estatística de Dados , Gastos em Saúde/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos Estatísticos , Distribuição de Poisson , Análise de Regressão , Estatísticas não ParamétricasRESUMO
We propose a generalized partially linear functional single index risk score model for repeatedly measured outcomes where the index itself is a function of time. We fuse the nonparametric kernel method and regression spline method, and modify the generalized estimating equation to facilitate estimation and inference. We use local smoothing kernel to estimate the unspecified coefficient functions of time, and use B-splines to estimate the unspecified function of the single index component. The covariance structure is taken into account via a working model, which provides valid estimation and inference procedure whether or not it captures the true covariance. The estimation method is applicable to both continuous and discrete outcomes. We derive large sample properties of the estimation procedure and show different convergence rate of each component of the model. The asymptotic properties when the kernel and regression spline methods are combined in a nested fashion has not been studied prior to this work even in the independent data case.
RESUMO
The counting process with a Cox-type intensity function has been extensively applied to analyze recurrent event data, which assume that the underlying counting process is a time-transformed Poisson process and that the covariates have multiplicative or additive effects on the mean and rate functions of the counting process. The existing statistical inference, however, often encounters difficulties due to high-dimensional covariates, such as in gene expression and single nucleotide polymorphism data that have revolutionized our understanding of cancer recurrence and other diseases. In this paper, a technique of sufficient dimension reduction is applied to the mean and rate function for the number of occurrences of events over time. A two-step procedure is proposed to estimate the model components: first, a nonparametric estimator is proposed for the baseline, and then the basis of the central subspace and its dimension are estimated through a modified slicing inverse regression. On the basis of the estimated structural dimension and on the basis of the central subspace, we can estimate the regression function by using the local linear regression. A simulation is performed to confirm and assess the theoretical findings, and an application is demonstrated on a set of chronic granulomatous disease data.
Assuntos
Biometria/métodos , Interpretação Estatística de Dados , Recidiva , Infecções Bacterianas/prevenção & controle , Simulação por Computador , Doença Granulomatosa Crônica/complicações , Humanos , Interferon gama/uso terapêutico , Modelos EstatísticosRESUMO
Studies/trials assessing status and progression of periodontal disease (PD) usually focus on quantifying the relationship between the clustered (tooth within subjects) bivariate endpoints, such as probed pocket depth (PPD), and clinical attachment level (CAL) with the covariates. Although assumptions of multivariate normality can be invoked for the random terms (random effects and errors) under a linear mixed model (LMM) framework, violations of those assumptions may lead to imprecise inference. Furthermore, the response-covariate relationship may not be linear, as assumed under a LMM fit, and the regression estimates obtained therein do not provide an overall summary of the risk of PD, as obtained from the covariates. Motivated by a PD study on Gullah-speaking African-American Type-2 diabetics, we cast the asymmetric clustered bivariate (PPD and CAL) responses into a non-linear mixed model framework, where both random terms follow the multivariate asymmetric Laplace distribution (ALD). In order to provide a one-number risk summary, the possible non-linearity in the relationship is modeled via a single-index model, powered by polynomial spline approximations for index functions, and the normal mixture expression for ALD. To proceed with a maximum-likelihood inferential setup, we devise an elegant EM-type algorithm. Moreover, the large sample theoretical properties are established under some mild conditions. Simulation studies using synthetic data generated under a variety of scenarios were used to study the finite-sample properties of our estimators, and demonstrate that our proposed model and estimation algorithm can efficiently handle asymmetric, heavy-tailed data, with outliers. Finally, we illustrate our proposed methodology via application to the motivating PD study.
RESUMO
Beta distributions are commonly used to model proportion valued response variables, often encountered in longitudinal studies. In this article, we develop semi-parametric Beta regression models for proportion valued responses, where the aggregate covariate effect is summarized and flexibly modeled, using a interpretable monotone time-varying single index transform of a linear combination of the potential covariates. We utilize the potential of single index models, which are effective dimension reduction tools and accommodate link function misspecification in generalized linear mixed models. Our Bayesian methodology incorporates the missing-at-random feature of the proportion response and utilize Hamiltonian Monte Carlo sampling to conduct inference. We explore finite-sample frequentist properties of our estimates and assess the robustness via detailed simulation studies. Finally, we illustrate our methodology via application to a motivating longitudinal dataset on obesity research recording proportion body fat.
RESUMO
The nested case-control (NCC) design is widely used in epidemiologic studies as a cost-effective subcohort sampling method to study the association between a disease and its potential risk factors. NCC data are commonly analyzed using Thomas' partial likelihood approach under the Cox proportional hazards model assumption. However, the linear modeling form in the Cox model may be insufficient for practical applications, especially when there are a large number of risk factors under investigation. In this paper, we consider a partially linear single index proportional hazard model, which includes a linear component for covariates of interest to yield easily interpretable results and a nonparametric single index component to adjust for multiple confounders effectively. We propose to approximate the nonparametric single index function by polynomial splines and estimate the parameters of interest using an iterative algorithm based on the partial likelihood. Asymptotic properties of the resulting estimators are established. The proposed methods are evaluated using simulations and applied to an NCC study of ovarian cancer.
RESUMO
We consider a single-index structure to study heteroscedasticity in regression with high-dimensional predictors. A general class of estimating equations is introduced, the resulting estimators remain consistent even when the structure of the variance function is misspecified. The proposed estimators also possess an adaptive property in an asymptotic sense. That is, they estimate the conditional variance function asymptotically as well as if the conditional mean function was given a priori. Numerical studies confirm our theoretical observations and demonstrate that our proposed estimator is superior to existing estimators with less bias and smaller standard deviation.