Search | VHL Regional Portal

Detecting uniform differential item functioning for continuous response computerized adaptive testing.

Wang, Chun; Zhu, Ruoyi.

Appl Psychol Meas ; 48(1-2): 18-37, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38327608

ABSTRACT

Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms. We propose two uniform DIF detection methods in this scenario. The first is a modified version of the CAT-SIBTEST, a non-parametric method that does not depend on any specific item response theory model assumptions. The second is a regularization method, a parametric, model-based approach. Simulation studies show that both methods are effective in correctly identifying items with uniform DIF. A real data analysis is provided in the end to illustrate the utility and potential caveats of the two methods.

Using Bayesian item response theory for multicohort repeated measure design to estimate individual latent change scores.

Wang, Chun; Zhu, Ruoyi; Crane, Paul K; Choi, Seo-Eun; Jones, Richard N; Tommet, Douglas.

Psychol Methods ; 2023 Dec 14.

Article in English | MEDLINE | ID: mdl-38095987

ABSTRACT

Repeated measure data design has been used extensively in a wide range of fields, such as brain aging or developmental psychology, to answer important research questions exploring relationships between trajectory of change and external variables. In many cases, such data may be collected from multiple study cohorts and harmonized, with the intention of gaining higher statistical power and enhanced external validity. When psychological constructs are measured using survey scales, a fundamental psychometric challenge for data harmonization is to create commensurate measures for the constructs of interest across studies. Traditional analysis may fit a unidimensional item response theory model to data from one time point and one cohort to obtain item parameters and fix the same parameters in subsequent analyses. Such a simplified approach ignores item residual dependencies in the repeated measure design on one hand, and on the other hand, it does not exploit accumulated information from different cohorts. Instead, two alternative approaches should serve such data designs much better: an integrative approach using multiple-group two-tier model via concurrent calibration, and if such calibration fails to converge, a Bayesian sequential calibration approach that uses informative priors on common items to establish the scale. Both approaches use a Markov chain Monte Carlo algorithm that handles computational complexity well. Through a simulation study and an empirical study using Alzheimer's diseases neuroimage initiative cognitive battery data (i.e., language and executive functioning), we conclude that latent change scores obtained from these two alternative approaches are more precisely recovered. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Measurement precision across cognitive domains in the Alzheimer's Disease Neuroimaging Initiative (ADNI) data set.

Crane, Paul K; Choi, Seo-Eun; Lee, Michael; Scollard, Phoebe; Sanders, R Elizabeth; Klinedinst, Brandon; Nakano, Connie; Trittschuh, Emily H; Mez, Jesse; Saykin, Andrew J; Gibbons, Laura E; Wang, Chun; Mungas, Dan; Zhu, Ruoyi; Foldi, Nancy S; Lamar, Melissa; Jutten, Roos; Sikkes, Sietske A M; Grandoit, Evan; Rabin, Laura A; Jones, Richard N; Tommet, Doug; Mukherjee, Shubhabrata.

Neuropsychology ; 37(4): 373-382, 2023 May.

Article in English | MEDLINE | ID: mdl-37276134

ABSTRACT

OBJECTIVE: To demonstrate measurement precision of cognitive domains in the Alzheimer's Disease Neuroimaging Initiative (ADNI) data set. METHOD: Participants with normal cognition (NC), mild cognitive impairment (MCI), and Alzheimer's disease (AD) were included from all ADNI waves. We used data from each person's last study visit to calibrate scores for memory, executive function, language, and visuospatial functioning. We extracted item information functions for each domain and used these to calculate standard errors of measurement. We derived scores for each domain for each diagnostic group and plotted standard errors of measurement for the observed range of scores. RESULTS: Across all waves, there were 961 people with NC, 825 people with MCI, and 694 people with AD at their most recent study visit (data pulled February 25, 2019). Across ADNI's battery there were 34 memory items, 18 executive function items, 20 language items, and seven visuospatial items. Scores for each domain were highest on average for people with NC, intermediate for people with MCI, and lowest for people with AD, with most scores across all groups in the range of -1 to +1. Standard error of measurement in the range from -1 to +1 was highest for memory, intermediate for language and executive functioning, and lowest for visuospatial. CONCLUSION: Modern psychometric approaches provide tools to help understand measurement precision of the scales used in studies. In ADNI, there are important differences in measurement precision across cognitive domains. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

Subject(s)

Alzheimer Disease , Cognitive Dysfunction , Humans , Alzheimer Disease/diagnostic imaging , Cognitive Dysfunction/diagnostic imaging , Executive Function , Cognition , Neuroimaging

Using Lasso and Adaptive Lasso to Identify DIF in Multidimensional 2PL Models.

Wang, Chun; Zhu, Ruoyi; Xu, Gongjun.

Multivariate Behav Res ; 58(2): 387-407, 2023.

Article in English | MEDLINE | ID: mdl-35086405

ABSTRACT

Differential item functioning (DIF) analysis refers to procedures that evaluate whether an item's characteristic differs for different groups of persons after controlling for overall differences in performance. DIF is routinely evaluated as a screening step to ensure items behave the same across groups. Currently, the majority DIF studies focus predominately on unidimensional IRT models, although multidimensional IRT (MIRT) models provide a powerful tool for enriching the information gained in modern assessment. In this study, we explore regularization methods for DIF detection in MIRT models and compare their performance to the classic likelihood ratio test. Regularization methods have recently emerged as a new family of methods for DIF detection due to their advantages: (1) they bypass the tedious iterative purification procedure that is often needed in other methods for identifying anchor items, and (2) they can handle multiple covariates simultaneously. The specific regularization methods considered in the study are: lasso with expectation-maximization (EM), lasso with expectation-maximization-maximization (EMM) algorithm, and adaptive lasso with EM. Simulation results show that lasso EMM and adaptive lasso EM hold great promise when the sample size is large, and they both outperform lasso EM. A real data example from PROMIS depression and anxiety scales is presented in the end.

Subject(s)

Algorithms , Likelihood Functions

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL