RESUMO
Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, functional principal component analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This article presents an empirical simulation study evaluating the behavior of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.
Assuntos
Simulação por Computador , Modelos Estatísticos , Análise de Componente Principal , Humanos , Estudos Longitudinais , Demência , Estudos de Casos e Controles , Interpretação Estatística de DadosRESUMO
In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multisite longitudinal study on women's health during midlife.
Assuntos
Modelos Estatísticos , Feminino , Humanos , Estudos Longitudinais , Análise de Regressão , Calibragem , ViésRESUMO
Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be done for such data. In particular, we show that testing the marginal homogeneity based on point-wise distributions is feasible under some mild constraints and propose a new two-sample statistic that works well with both intensively and sparsely measured functional data. The proposed test statistic is formulated upon energy distance, and the convergence rate of the test statistic to its population version is derived along with the consistency of the associated permutation test. The aptness of our method is demonstrated on both synthetic and real data sets.
RESUMO
In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques.
Assuntos
Previsões/métodos , Análise de Regressão , Antropometria , Estatura , Peso Corporal , Desenvolvimento Infantil , Pré-Escolar , Simulação por Computador , Interpretação Estatística de Dados , Feminino , Crescimento , Gráficos de Crescimento , Humanos , Lactente , Recém-Nascido , Masculino , PeruRESUMO
In this work we propose a functional concurrent regression model to estimate labor supply elasticities over the years 1988 through 2014 using Current Population Survey data. Assuming, as is common, that individuals' wages are endogenous, we introduce instrumental variables in a two-stage least squares approach to estimate the desired labor supply elasticities. Furthermore, we tailor our estimation method to sparse functional data. Though recent work has incorporated instrumental variables into other functional regression models, to our knowledge this has not yet been done in the functional concurrent regression model, and most existing literature is not suited for sparse functional data. We show through simulations that this two-stage least squares approach greatly eliminates the bias introduced by a naive model (i.e. one that does not acknowledge endogeneity) and produces accurate coefficient estimates for moderate sample sizes.
RESUMO
We consider estimation of mean and covariance functions of functional snippets, which are short segments of functions possibly observed irregularly on an individual specific subinterval that is much shorter than the entire study interval. Estimation of the covariance function for functional snippets is challenging since information for the far off-diagonal regions of the covariance structure is completely missing. We address this difficulty by decomposing the covariance function into a variance function component and a correlation function component. The variance function can be effectively estimated nonparametrically, while the correlation part is modeled parametrically, possibly with an increasing number of parameters, to handle the missing information in the far off-diagonal regions. Both theoretical analysis and numerical simulations suggest that this hybrid strategy is effective. In addition, we propose a new estimator for the variance of measurement errors and analyze its asymptotic properties. This estimator is required for the estimation of the variance function from noisy measurements.
RESUMO
We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method.