RESUMEN
BACKGROUND: Collection of intensive longitudinal health outcomes allows joint modeling of their mean (location) and variability (scale). Focusing on the location of the outcome, measures to detect influential subjects in longitudinal data using standard mixed-effects regression models (MRMs) have been widely discussed. However, no existing approach enables the detection of subjects that heavily influence the scale of the outcome. METHODS: We propose applying mixed-effects location scale (MELS) modeling combined with commonly used influence measures such as Cook's distance and DFBETAS to fill this gap. In this paper, we provide a framework for researchers to follow when trying to detect influential subjects for both the scale and location of the outcome. The framework allows detailed examination of each subject's influence on model fit as well as point estimates and precision of coefficients in different components of a MELS model. RESULTS: We simulated two common scenarios in longitudinal healthcare studies and found that influence measures in our framework successfully capture influential subjects over 99% of the time. We also re-analyzed data from a health behavior study and found 4 particularly influential subjects, among which two cannot be detected by influence analyses via regular MRMs. CONCLUSION: The proposed framework can help researchers detect influential subject(s) that will be otherwise overlooked by influential analysis using regular MRMs and analyze all data in one model despite influential subjects.
Asunto(s)
Estudios Longitudinales , HumanosRESUMEN
Ecological Momentary Assessment data present some new modeling opportunities. Typically, there are sufficient data to explicitly model the within-subject (WS) variance, and in many applications, it is of interest to allow the WS variance to depend on covariates as well as random subject effects. We describe a model that allows multiple random effects per subject in the mean model (eg, random location intercept and slopes), as well as random scale in the error variance model. We present an example of the use of this model on a real dataset and a simulation study that shows the benefit of this model, relative to simpler approaches.
Asunto(s)
Evaluación Ecológica Momentánea , Simulación por Computador , HumanosRESUMEN
The use of intensive sampling methods, such as ecological momentary assessment (EMA), is increasingly prominent in medical research. However, inferences from such data are often limited to the subject-specific mean of the outcome and between-subject variance (i.e., random intercept), despite the capability to examine within-subject variance (i.e., random scale) and associations between covariates and subject-specific mean (i.e., random slope). MixWILD (Mixed model analysis With Intensive Longitudinal Data) is statistical software that tests the effects of subject-level parameters (variance and slope) of time-varying variables, specifically in the context of studies using intensive sampling methods, such as ecological momentary assessment. MixWILD combines estimation of a stage 1 mixed-effects location-scale (MELS) model, including estimation of the subject-specific random effects, with a subsequent stage 2 linear or binary/ordinal logistic regression in which values sampled from each subject's random effect distributions can be used as regressors (and then the results are aggregated across replications). Computations within MixWILD were written in FORTRAN and use maximum likelihood estimation, utilizing both the expectation-maximization (EM) algorithm and a Newton-Raphson solution. The mean and variance of each individual's random effects used in the sampling are estimated using empirical Bayes equations. This manuscript details the underlying procedures and provides examples illustrating standalone usage and features of MixWILD and its GUI. MixWILD is generalizable to a variety of data collection strategies (i.e., EMA, sensors) as a robust and reproducible method to test predictors of variability in level 1 outcomes and the associations between subject-level parameters (variances and slopes) and level 2 outcomes.
Asunto(s)
Biometría , Programas Informáticos , Teorema de Bayes , Investigación Biomédica , Modelos Logísticos , Estudios Longitudinales , Proyectos de InvestigaciónRESUMEN
Ecological momentary assessment studies usually produce intensively measured longitudinal data with large numbers of observations per unit, and research interest is often centered around understanding the changes in variation of people's thoughts, emotions and behaviors. Hedeker et al developed a 2-level mixed effects location scale model that allows observed covariates as well as unobserved variables to influence both the mean and the within-subjects variance, for a 2-level data structure where observations are nested within subjects. In some ecological momentary assessment studies, subjects are measured at multiple waves, and within each wave, subjects are measured over time. Li and Hedeker extended the original 2-level model to a 3-level data structure where observations are nested within days and days are then nested within subjects, by including a random location and scale intercept at the intermediate wave level. However, the 3-level random intercept model assumes constant response change rate for both the mean and variance. To account for changes in variance across waves, as well as clustering attributable to waves, we propose a more comprehensive location scale model that allows subject heterogeneity at baseline as well as across different waves, for a 3-level data structure where observations are nested within waves and waves are then further nested within subjects. The model parameters are estimated using Markov chain Monte Carlo methods. We provide details on the Bayesian estimation approach and demonstrate how the Stan statistical software can be used to sample from the desired distributions and achieve consistent estimates. The proposed model is validated via a series of simulation studies. Data from an adolescent smoking study are analyzed to demonstrate this approach. The analyses clearly favor the proposed model and show significant subject heterogeneity at baseline as well as change over time, for both mood mean and variance. The proposed 3-level location scale model can be widely applied to areas of research where the interest lies in the consistency in addition to the mean level of the responses.
Asunto(s)
Teorema de Bayes , Evaluación Ecológica Momentánea , Adolescente , Humanos , Modelos Estadísticos , Fumar/epidemiologíaRESUMEN
MIXREGLS is a program which provides estimates for a mixed-effects location scale model assuming a (conditionally) normally-distributed dependent variable. This model can be used for analysis of data in which subjects may be measured at many observations and interest is in modeling the mean and variance structure. In terms of the variance structure, covariates can by specified to have effects on both the between-subject and within-subject variances. Another use is for clustered data in which subjects are nested within clusters (e.g., clinics, hospitals, schools, etc.) and interest is in modeling the between-cluster and within-cluster variances in terms of covariates. MIXREGLS was written in Fortran and uses maximum likelihood estimation, utilizing both the EM algorithm and a Newton-Raphson solution. Estimation of the random effects is accomplished using empirical Bayes methods. Examples illustrating stand-alone usage and features of MIXREGLS are provided, as well as use via the SAS and R software packages.
RESUMEN
INTRODUCTION: Conventional Z-scores are generated by subtracting the mean and dividing by the standard deviation. More recent methods linearly correct for age, sex, and education, so that these "adjusted" Z-scores better represent whether an individual's cognitive performance is abnormal. Extreme negative Z-scores for individuals relative to this normative distribution are considered indicative of cognitive deficiency. METHODS: In this article, we consider nonlinear shape constrained additive models accounting for age, sex, and education (correcting for nonlinearity). Additional shape constrained additive models account for varying standard deviation of the cognitive scores with age (correcting for heterogeneity of variance). RESULTS: Corrected Z-scores based on nonlinear shape constrained additive models provide improved adjustment for age, sex, and education, as indicated by higher adjusted-R2. DISCUSSION: Nonlinearly corrected Z-scores with respect to age, sex, and education with age-varying residual standard deviation allow for improved detection of non-normative extreme cognitive scores.
RESUMEN
We propose a novel method for estimating population-level and subject-specific effects of covariates on the variability of functional data. We extend the functional principal components analysis framework by modeling the variance of principal component scores as a function of covariates and subject-specific random effects. In a setting where principal components are largely invariant across subjects and covariate values, modeling the variance of these scores provides a flexible and interpretable way to explore factors that affect the variability of functional data. Our work is motivated by a novel dataset from an experiment assessing upper extremity motor control, and quantifies the reduction in motion variance associated with skill learning.
RESUMEN
In health studies, questionnaire items are often scored on an ordinal scale, for example on a Likert scale. For such questionnaires, item response theory (IRT) models provide a useful approach for obtaining summary scores for subjects (i.e., the model's random subject effect) and characteristics of the items (e.g., item difficulty and discrimination). In this article, we describe a model that allows the items to additionally exhibit different within-subject variance, and also includes a subject-level random effect to the within-subject variance specification. This permits subjects to be characterized in terms of their mean level, or location, and their variability, or scale, and the model allows item difficulty and discrimination in terms of both random subject effects (location and scale). We illustrate application of this location-scale mixed model using data from the Social Subscale of the Drinking Motives Questionnaire (SS-DMQ) assessed in an adolescent study. We show that the proposed model fits the data significantly better than simpler IRT models, and is able to identify items and subjects that are not well-fit by the simpler models. The proposed model has useful applications in many areas where questionnaires are often rated on an ordinal scale, and there is interest in characterizing subjects in terms of both their mean and variability.
RESUMEN
A bivariate mixed-effects location-scale model is proposed for estimation of means, variances, and covariances of two continuous outcomes measured concurrently in time and repeatedly over subjects. Modeling the two outcomes jointly allows examination of BS and WS association between the outcomes and whether the associations are related to covariates. The variance-covariance matrices of the BS and WS effects are modeled in terms of covariates, explaining BS and WS heterogeneity. The proposed model relaxes assumptions on the homogeneity of the within-subject (WS) and between-subject (BS) variances. Furthermore, the WS variance models are extended by including random scale effects. Data from a natural history study on adolescent smoking are used for illustration. 461 students, from 9th and 10th grades, reported on their mood at random prompts during seven consecutive days. This resulted in 14,105 prompts with an average of 30 responses per student. The two outcomes considered were a subject's positive affect and a measure of how tired and bored they were feeling. Results showed that the WS association of the outcomes was negative and significantly associated with several covariates. The BS and WS variances were heterogeneous for both outcomes, and the variance of the random scale effects were significantly different from zero.