RESUMEN
Variability in treatment effects is common in intervention studies using cluster randomized controlled trial (C-RCT) designs. Such variability is often examined in multilevel modeling (MLM) to understand how treatment effects (TRT) differ based on the level of a covariate (COV), called TRT × COV. In detecting TRT × COV effects using MLM, relationships between covariates and outcomes are assumed to vary across clusters linearly. However, this linearity assumption may not hold in all applications and an incorrect assumption may lead to biased statistical inference about TRT × COV effects. In this study, we present generalized additive mixed model (GAMM) specifications in which cluster-specific functional relationships between covariates and outcomes can be modeled using by-variable smooth functions. In addition, the implementation for GAMM specifications is explained using the mgcv R package (Wood, 2021). The usefulness of the GAMM specifications is illustrated using intervention data from a C-RCT. Results of simulation studies showed that parameters and by-variable smooth functions were recovered well in various multilevel designs and the misspecification of the relationship between covariates and outcomes led to biased estimates of TRT × COV effects. Furthermore, this study evaluated the extent to which the GAMM can be treated as an alternative model to MLM in the presence of a linear relationship.
Asunto(s)
Ensayos Clínicos Controlados Aleatorios como Asunto , Humanos , Simulación por Computador , Análisis por ConglomeradosRESUMEN
OBJECTIVES: Listening-related fatigue can be a significant problem for adults who struggle to hear and understand, particularly adults with hearing loss. However, valid, sensitive, and clinically useful measures for listening-related fatigue do not currently exist. The purpose of this study was to develop and validate a brief clinical tool for measuring listening-related fatigue in adults. DESIGN: The clinical scale was derived from the 40-item version of the Vanderbilt Fatigue Scale for Adults (VFS-A-40), an existing, reliable, and valid research tool for measuring listening-related fatigue. The study consisted of two phases. Phase 1 ( N = 580) and Phase 2 ( N = 607) participants consisted of convenience samples of adults recruited via online advertisements, clinical records review, and a pool of prior research participants. In Phase 1, results from item response theory (IRT) analyses of VFS-A-40 items were used to identify high-quality items for the brief (10-item) clinical scale: the VFS-A-10. In Phase 2, the characteristics and quality of the VFS-A-10 were evaluated in a separate sample of respondents. Dimensionality was evaluated using exploratory factor analyses (EFAs) and item quality and characteristics were evaluated using IRT. VFS-A-10 reliability and validity were assessed in multiple ways. IRT reliability analysis was used to examine VFS-A-10 measurement fidelity. In addition, test-retest reliability was assessed in a subset of Phase 2 participants ( n = 145) who completed the VFS-A-10 a second time approximately one month after their initial measure (range 5 to 90 days). IRT differential item functioning (DIF) was used to assess item bias across different age, gender, and hearing loss subgroups. Convergent construct validity was evaluated by comparing VFS-A-10 responses to two other generic fatigue scales and a measure of hearing disability. Known-groups validity was assessed by comparing VFS-A-10 scores between adults with and without self-reported hearing loss. RESULTS: EFA suggested a unidimensional structure for the VFS-A-10. IRT analyses confirmed all test items were high quality. IRT reliability analysis revealed good measurement fidelity over a wide range of fatigue severities. Test-retest reliability was excellent ( rs = 0.88, collapsed across participants). IRT DIF analyses confirmed the VFS-A-10 provided a valid measure of listening-related fatigue regardless of respondent age, gender, or hearing status. An examination of associations between VFS-A-10 scores and generic fatigue/vigor measures revealed only weak-to-moderate correlations (Spearman's correlation coefficient, rs = -0.36 to 0.57). Stronger associations were seen between VFS-A-10 scores and a measure of perceived hearing difficulties ( rs = 0.79 to 0.81) providing evidence of convergent construct validity. In addition, the VFS-A-10 was more sensitive to fatigue associated with self-reported hearing difficulties than generic measures. It was also more sensitive than generic measures to variations in fatigue as a function of degree of hearing impairment. CONCLUSIONS: These findings suggest that the VFS-A-10 is a reliable, valid, and sensitive tool for measuring listening-related fatigue in adults. Its brevity, high sensitivity, and good reliability make it appropriate for clinical use. The scale will be useful for identifying those most affected by listening-related fatigue and for assessing benefits of interventions designed to reduce its negative effects.
Asunto(s)
Sordera , Pérdida Auditiva , Adulto , Humanos , Fatiga/diagnóstico , Audición , Pérdida Auditiva/diagnóstico , Psicometría , Reproducibilidad de los Resultados , Encuestas y Cuestionarios , Masculino , FemeninoRESUMEN
In education and psychology, single-case designs (SCDs) have been used to detect treatment effects using time series data in the presence or absence of intervention. One popular design variant of SCDs is a multiple-baseline design for multiple outcomes, which often collects outcomes with some form of a count. A Poisson model is a natural choice for the count outcome. However, the assumption of the Poisson model that the outcome variable's mean is equal to its variance is often violated in SCDs, as the variance is often larger than the mean (called overdispersion). In addition, when multiple outcomes are from the same participant, it is likely that they are correlated. In this paper, we present a vector Poisson log-normal additive (V-PLN-A) model to deal with (a) change processes (auto- and cross-correlations and data-driven trend) and (b) correlation and overdispersion in multivariate count time series. A multivariate normal distribution was adapted to account for correlation among multiple outcomes as well as possible overdispersion. The V-PLN-A model was applied to an educational intervention study to test treatment effects. Simulation study results showed that parameter recovery of the V-PLN-A model was satisfactory in a large number of timepoints using Bayesian analysis, and that ignoring change processes and overdispersion led to biased estimates of the treatment effects.
Asunto(s)
Modelos Estadísticos , Teorema de Bayes , Humanos , Distribución de Poisson , Factores de TiempoRESUMEN
Multilevel data structures are often found in multiple substantive research areas, and multilevel models (MLMs) have been widely used to allow for such multilevel data structures. One important step when applying MLM is the selection of an optimal set of random effects to account for variability and heteroscedasticity in multilevel data. Literature reviews on current practices in applying MLM showed that diagnostic plots are only rarely used for model selection and for model checking. In this study, possible random effects and a generic description of the random effects were provided to guide researchers to select necessary random effects. In addition, based on extensive literature reviews, level-specific diagnostic plots were presented using various kinds of level-specific residuals, and diagnostic measures and statistical tests were suggested to select a set of random effects. Existing and newly proposed methods were illustrated using two data sets: a cross-sectional data set and a longitudinal data set. Along with the illustration, we discuss the methods and provide guidelines to select necessary random effects in model-building steps. R code was provided for the analyses.
Asunto(s)
Modelos Estadísticos , Humanos , Estudios Transversales , Análisis MultinivelRESUMEN
Syntactic priming effects have been investigated for several decades in psycholinguistics and the cognitive sciences to understand the cognitive mechanisms that support language production and comprehension. The question of whether speakers prime themselves is central to adjudicating between two theories of syntactic priming, activation-based theories and expectation-based theories. However, there is a lack of a statistical model to investigate the two different theories when nominal repeated measures are obtained from multiple participants and items. This paper presents a Markov mixed-effect multinomial logistic regression model in which there are fixed and random effects for own-category lags and cross-category lags in a multivariate structure and there are category-specific crossed random effects (random person and item effects). The model is illustrated with experimental data that investigates the average and participant-specific deviations in syntactic self-priming effects. Results of the model suggest that evidence of self-priming is consistent with the predictions of activation-based theories. Accuracy of parameter estimates and precision is evaluated via a simulation study using Bayesian analysis.
Asunto(s)
Comprensión , Psicolingüística , Teorema de Bayes , Ciencia Cognitiva , Humanos , Modelos LogísticosRESUMEN
This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.
Asunto(s)
Funciones de Verosimilitud , Análisis Multinivel , Reproducibilidad de los Resultados , Teorema de Bayes , Humanos , Método de Montecarlo , Teoría Psicológica , Encuestas y CuestionariosRESUMEN
The Vanderbilt Expertise Test for cars (VETcar) is a test of visual learning for contemporary car models. We used item response theory to assess the VETcar and in particular used differential item functioning (DIF) analysis to ask if the test functions the same way in laboratory versus online settings and for different groups based on age and gender. An exploratory factor analysis found evidence of multidimensionality in the VETcar, although a single dimension was deemed sufficient to capture the recognition ability measured by the test. We selected a unidimensional three-parameter logistic item response model to examine item characteristics and subject abilities. The VETcar had satisfactory internal consistency. A substantial number of items showed DIF at a medium effect size for test setting and for age group, whereas gender DIF was negligible. Because online subjects were on average older than those tested in the lab, we focused on the age groups to conduct a multigroup item response theory analysis. This revealed that most items on the test favored the younger group. DIF could be more the rule than the exception when measuring performance with familiar object categories, therefore posing a challenge for the measurement of either domain-general visual abilities or category-specific knowledge.
Asunto(s)
Automóviles , Aprendizaje/fisiología , Percepción Visual/fisiología , Adulto , Femenino , Humanos , Modelos Logísticos , Masculino , Persona de Mediana Edad , Pruebas Neuropsicológicas , Psicometría/métodos , Adulto JovenRESUMEN
Popular statistical software provides the Bayesian information criterion (BIC) for multi-level models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties as to the proper use of the BIC in selecting a multi-level model with respect to level-specific fixed and random effects. These discrepancies and uncertainties result from different specifications of sample size in the BIC's penalty term for multi-level models. In this study, we derive the BIC's penalty term for level-specific fixed- and random-effect selection in a two-level nested design. In this new version of BIC, called BIC E 1 , this penalty term is decomposed into two parts if the random-effect variance-covariance matrix has full rank: (a) a term with the log of average sample size per cluster and (b) the total number of parameters times the log of the total number of clusters. Furthermore, we derive the new version of BIC, called BIC E 2 , in the presence of redundant random effects. We show that the derived formulae, BIC E 1 and BIC E 2 , adhere to empirical values via numerical demonstration and that BIC E ( E indicating either E 1 or E 2 ) is the best global selection criterion, as it performs at least as well as BIC with the total sample size and BIC with the number of clusters across various multi-level conditions through a simulation study. In addition, the use of BIC E 1 is illustrated with a textbook example dataset.
Asunto(s)
Programas Informáticos , Tamaño de la Muestra , Teorema de Bayes , Modelos Lineales , Simulación por ComputadorRESUMEN
This paper presents a model specification for group comparisons regarding a functional trend over time within a trial and learning across a series of trials in intensive binary longitudinal eye-tracking data. The functional trend and learning effects are modeled using by-variable smooth functions. This model specification is formulated as a generalized additive mixed model, which allowed for the use of the freely available mgcv package (Wood in Package 'mgcv.' https://cran.r-project.org/web/packages/mgcv/mgcv.pdf , 2023) in R. The model specification was applied to intensive binary longitudinal eye-tracking data, where the questions of interest concern differences between individuals with and without brain injury in their real-time language comprehension and how this affects their learning over time. The results of the simulation study show that the model parameters are recovered well and the by-variable smooth functions are adequately predicted in the same condition as those found in the application.
RESUMEN
The Center for Epidemiologic Studies Depression Scale - Revised (CESD-R) is a popular self-report screening measure for depression. A 20-item questionnaire with scores ranging from 0 to 4 for each item, the CESD-R can produce total scores ranging from 0 to 80. However, the typical scoring protocol for the CESD-R restricts the range of possible scores to between 0 and 60 to retain the same range and clinical cutoff scores as the original CES-D. Despite the widespread adoption of this scoring approach, the psychometric impact has never been systematically examined. In an undergraduate and community adult sample (n = 869), item response theory analyses indicated that scoring the CESD-R with all 5 response options (CESD-R5opt) provided nearly twice as much information about a person's latent depression for individuals with high levels of depression than did scoring the CESD-R with 4 response options per item (CESD-R4opt). The CESD-R5opt retained the strong reliability and factor structure of the CESD-R4opt and was more sensitive to individual differences for participants at high levels of depression compared to the CESD-R4opt. Results provide preliminary evidence that researchers and clinicians should score the CESD-R using the full 0-to-80 scale and a clinical cutoff score of 29. Supplementary Information: The online version contains supplementary material available at 10.1007/s10862-024-10155-y.
RESUMEN
Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge. In addition, empirical Bayes and hierarchical Bayes methods are compared to show how hierarchical Bayes can result in estimates of posterior variance with greater accuracy than empirical Bayes by acknowledging the uncertainty of item parameter estimates. The proposed methods were evaluated via a simulation study. Simulation results showed that hierarchical Bayes methods can be acceptable alternatives to MMLE under various testing conditions, and we provide a guideline to indicate which methods would be recommended in different research situations. R functions are provided to implement these proposed methods.
RESUMEN
Signal detection theory (SDT; Tanner & Swets in Psychological Review 61:401-409, 1954) is a dominant modeling framework used for evaluating the accuracy of diagnostic systems that seek to distinguish signal from noise in psychology. Although the use of response time data in psychometric models has increased in recent years, the incorporation of response time data into SDT models remains a relatively underexplored approach to distinguishing signal from noise. Functional response time effects are hypothesized in SDT models, based on findings from other related psychometric models with response time data. In this study, an SDT model is extended to incorporate functional response time effects using smooth functions and to include all sources of variability in SDT model parameters across trials, participants, and items in the experimental data. The extended SDT model with smooth functions is formulated as a generalized linear mixed-effects model and implemented in the gamm4 R package. The extended model is illustrated using recognition memory data to understand how conversational language is remembered. Accuracy of parameter estimates and the importance of modeling variability in detecting the experimental condition effects and functional response time effects are shown in conditions similar to the empirical data set via a simulation study. In addition, the type 1 error rate of the test for a smooth function of response time is evaluated.
Asunto(s)
Reconocimiento en Psicología , Detección de Señal Psicológica , Humanos , Detección de Señal Psicológica/fisiología , Tiempo de Reacción , Psicometría , Simulación por ComputadorRESUMEN
Eye-tracking has emerged as a popular method for empirical studies of cognitive processes across multiple substantive research areas. Eye-tracking systems are capable of automatically generating fixation-location data over time at high temporal resolution. Often, the researcher obtains a binary measure of whether or not, at each point in time, the participant is fixating on a critical interest area or object in the real world or in a computerized display. Eye-tracking data are characterized by spatial-temporal correlations and random variability, driven by multiple fine-grained observations taken over small time intervals (e.g., every 10 ms). Ignoring these data complexities leads to biased inferences for the covariates of interest such as experimental condition effects. This article presents a novel application of a generalized additive logistic regression model for intensive binary time series eye-tracking data from a between- and within-subjects experimental design. The model is formulated as a generalized additive mixed model (GAMM) and implemented in the mgcv R package. The generalized additive logistic regression model was illustrated using an empirical data set aimed at understanding the accommodation of regional accents in spoken language processing. Accuracy of parameter estimates and the importance of modeling the spatial-temporal correlations in detecting the experimental condition effects were shown in conditions similar to our empirical data set via a simulation study. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Asunto(s)
Tecnología de Seguimiento Ocular , Simulación por Computador , Humanos , Modelos Logísticos , Factores de TiempoRESUMEN
There is recent evidence for a domain-general object recognition ability, called O, which is distinct from general intelligence and other cognitive and personality constructs. We extend the study of O by characterizing how it generalizes to the ability to recognize familiar objects and to the ability to make judgments of the average identity of ensembles of objects. We applied latent variable modeling to data collected from a sample of adults (N = 284) in three different tasks and for six different object domains (three novel and three familiar). The results replicated prior work in finding that on average 88% of the variance of lower-order factors could be accounted by O for novel objects. The latent constructs recruited by the higher-order factor for novel objects and for familiar objects were almost perfectly correlated and therefore functionally identical. A latent factor for ensemble perception shared about 42% of the variance with O, suggesting at least strong overlap between abilities supporting judgments about individual objects and ensemble of objects. This work extends the theoretical reach of O by showing generalization across two dimensions (familiar vs. novel objects; individual vs. ensemble object perception). With respect to the structure of individual differences in high-level vision, researchers would benefit from accounting for the contribution of O when seeking to understand various domain-specific abilities. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Asunto(s)
Reconocimiento en Psicología , Percepción Visual , Adulto , Humanos , Individualidad , Juicio , Visión OcularRESUMEN
PURPOSE: Growing evidence suggests that fatigue associated with listening difficulties is particularly problematic for children with hearing loss (CHL). However, sensitive, reliable, and valid measures of listening-related fatigue do not exist. To address this gap, this article describes the development, psychometric evaluation, and preliminary validation of a suite of scales designed to assess listening-related fatigue in CHL: the pediatric versions of the Vanderbilt Fatigue Scale (VFS-Peds). METHOD: Test development employed best practices, including operationalizing the construct of listening-related fatigue from the perspective of target respondents (i.e., children, their parents, and teachers). Test items were developed based on input from these groups. Dimensionality was evaluated using exploratory factor analyses (EFAs). Item response theory (IRT) and differential item functioning (DIF) analyses were used to identify high-quality items, which were further evaluated and refined to create the final versions of the VFS-Peds. RESULTS: The VFS-Peds is appropriate for use with children aged 6-17 years and consists of child self-report (VFS-C), parent proxy-report (VFS-P), and teacher proxy-report (VFS-T) scales. EFA of child self-report and teacher proxy data suggested that listening-related fatigue was unidimensional in nature. In contrast, parent data suggested a multidimensional construct, composed of mental (cognitive, social, and emotional) and physical domains. IRT analyses suggested that items were of good quality, with high information and good discriminability. DIF analyses revealed the scales provided a comparable measure of fatigue regardless of the child's gender, age, or hearing status. Test information was acceptable over a wide range of fatigue severities and all scales yielded acceptable reliability and validity. CONCLUSIONS: This article describes the development, psychometric evaluation, and validation of the VFS-Peds. Results suggest that the VFS-Peds provide a sensitive, reliable, and valid measure of listening-related fatigue in children that may be appropriate for clinical use. Such scales could be used to identify those children most affected by listening-related fatigue, and given their apparent sensitivity, the scales may also be useful for examining the effectiveness of potential interventions targeting listening-related fatigue in children. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.19836154.
Asunto(s)
Percepción Auditiva , Pérdida Auditiva , Fatiga Mental , Encuestas y Cuestionarios , Adolescente , Percepción Auditiva/fisiología , Niño , Pérdida Auditiva/fisiopatología , Humanos , Fatiga Mental/diagnóstico , Padres , Apoderado , Psicometría , Reproducibilidad de los Resultados , MaestrosRESUMEN
A cluster randomized controlled trial (C-RCT) is common in educational intervention studies. Multilevel modelling (MLM) is a dominant analytic method to evaluate treatment effects in a C-RCT. In most MLM applications intended to detect an interaction effect, a single interaction effect (called a conflated effect) is considered instead of level-specific interaction effects in a multilevel design (called unconflated multilevel interaction effects), and the linear interaction effect is modelled. In this paper we present a generalized additive mixed model (GAMM) that allows an unconflated multilevel interaction to be estimated without assuming a prespecified form of the interaction. R code is provided to estimate the model parameters using maximum likelihood estimation and to visualize the nonlinear treatment-by-covariate interaction. The usefulness of the model is illustrated using instructional intervention data from a C-RCT. Results of simulation studies showed that the GAMM outperformed an alternative approach to recover an unconflated logistic multilevel interaction. In addition, the parameter recovery of the GAMM was relatively satisfactory in multilevel designs found in educational intervention studies, except when the number of clusters, cluster sizes, and intraclass correlations were small. When modelling a linear multilevel treatment-by-covariate interaction in the presence of a nonlinear effect, biased estimates (such as overestimated standard errors and overestimated random effect variances) and incorrect predictions of the unconflated multilevel interaction were found.
Asunto(s)
Proyectos de Investigación , Análisis por Conglomerados , Simulación por Computador , Interpretación Estadística de Datos , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
In response to the target article by Teresi et al. (2021), we explain why the article is useful and we also present a different approach. An alternative category of differential item functioning (DIF) is presented with a corresponding way of modeling DIF, based on random person and random item effects and explanatory covariates.
Asunto(s)
Psicometría , HumanosRESUMEN
Recent findings point to a role for hippocampus in the moment-by-moment processing of language, including the use and generation of semantic features in certain contexts. What role the hippocampus might play in the processing of semantic relations in spoken language comprehension, however, is unknown. Here we test patients with bilateral hippocampal damage and dense amnesia in order to examine the necessity of hippocampus for lexico-semantic mapping processes in spoken language understanding. In two visual-world eye-tracking experiments, we monitor eye movements to images that are semantically related to spoken words and sentences. We find no impairment in amnesia, relative to matched healthy comparison participants. These findings suggest, at least for close semantic links and simple language comprehension tasks, a lack of necessity for hippocampus in lexico-semantic mapping between spoken words and simple pictures.
Asunto(s)
Lenguaje , Semántica , Amnesia , Hipocampo/diagnóstico por imagen , Humanos , MemoriaRESUMEN
Listening-related fatigue can be a significant burden for adults with hearing loss (AHL), and potentially those with other health or language-related issues (e.g., multiple sclerosis, traumatic brain injury, second language learners) who must allocate substantial cognitive resources to the process of listening. The 40-item Vanderbilt Fatigue Scale for Adults (VFS-A-40) was designed to measure listening-related fatigue in such populations. This article describes the development, and psychometric properties, of the VFS-A-40. Initial qualitative analyses in AHL suggested listening-related fatigue was multidimensional, with physical, mental, emotional, and social domains. However, exploratory factor analyses revealed a unidimensional structure. Item and test characteristics were evaluated using Item Response Theory (IRT). Results confirmed that all test items were of high quality. IRT analyses revealed high marginal reliability and an analysis of test-retest scores revealed adequate reliability. In addition, an analysis of differential item functioning provided evidence of good construct validity across age, gender, and hearing loss groups. In sum, the VFS-A-40 is a reliable and valid tool for quantifying listening-related fatigue in adults. We believe the VFS-A-40 will be useful for identifying those most at risk for severe listening-related fatigue and for assessing interventions to reduce its negative effects. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Asunto(s)
Fatiga , Encuestas y Cuestionarios , Adulto , Análisis Factorial , Fatiga/diagnóstico , Humanos , Psicometría , Reproducibilidad de los ResultadosRESUMEN
This paper presents a dynamic tree-based item response (IRTree) model as a novel extension of the autoregressive generalized linear mixed effect model (dynamic GLMM). We illustrate the unique utility of the dynamic IRTree model in its capability of modeling differentiated processes indicated by intensive polytomous time-series eye-tracking data. The dynamic IRTree was inspired by but is distinct from the dynamic GLMM which was previously presented by Cho, Brown-Schmidt, and Lee (Psychometrika 83(3):751-771, 2018). Unlike the dynamic IRTree, the dynamic GLMM is suitable for modeling intensive binary time-series eye-tracking data to identify visual attention to a single interest area over all other possible fixation locations. The dynamic IRTree model is a general modeling framework which can be used to model change processes (trend and autocorrelation) and which allows for decomposing data into various sources of heterogeneity. The dynamic IRTree model was illustrated using an experimental study that employed the visual-world eye-tracking technique. The results of a simulation study showed that parameter recovery of the model was satisfactory and that ignoring trend and autoregressive effects resulted in biased estimates of experimental condition effects in the same conditions found in the empirical study.