Pesquisa | Secretaria de Estado da Saúde

Identifying Disengaged Responding in Multiple-Choice Items: Extending a Latent Class Item Response Model With Novel Process Data Indicators.

Welling, Jana; Gnambs, Timo; Carstensen, Claus H.

Educ Psychol Meas ; 84(2): 314-339, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38898880

RESUMO

Disengaged responding poses a severe threat to the validity of educational large-scale assessments, because item responses from unmotivated test-takers do not reflect their actual ability. Existing identification approaches rely primarily on item response times, which bears the risk of misclassifying fast engaged or slow disengaged responses. Process data with its rich pool of additional information on the test-taking process could thus be used to improve existing identification approaches. In this study, three process data variables-text reread, item revisit, and answer change-were introduced as potential indicators of response engagement for multiple-choice items in a reading comprehension test. An extended latent class item response model for disengaged responding was developed by including the three new indicators as additional predictors of response engagement. In a sample of 1,932 German university students, the extended model indicated a better model fit than the baseline model, which included item response time as only indicator of response engagement. In the extended model, both item response time and text reread were significant predictors of response engagement. However, graphical analyses revealed no systematic differences in the item and person parameter estimation or item response classification between the models. These results suggest only a marginal improvement of the identification of disengaged responding by the new indicators. Implications of these results for future research on disengaged responding with process data are discussed.

Linking of Rasch-Scaled Tests: Consequences of Limited Item Pools and Model Misfit.

Fischer, Luise; Rohm, Theresa; Carstensen, Claus H; Gnambs, Timo.

Front Psychol ; 12: 633896, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34295279

RESUMO

In the context of item response theory (IRT), linking the scales of two measurement points is a prerequisite to examine a change in competence over time. In educational large-scale assessments, non-identical test forms sharing a number of anchor-items are frequently scaled and linked using two- or three-parametric item response models. However, if item pools are limited and/or sample sizes are small to medium, the sparser Rasch model is a suitable alternative regarding the precision of parameter estimation. As the Rasch model implies stricter assumptions about the response process, a violation of these assumptions may manifest as model misfit in form of item discrimination parameters empirically deviating from their fixed value of one. The present simulation study investigated the performance of four IRT linking methods-fixed parameter calibration, mean/mean linking, weighted mean/mean linking, and concurrent calibration-applied to Rasch-scaled data with a small item pool. Moreover, the number of anchor items required in the absence/presence of moderate model misfit was investigated in small to medium sample sizes. Effects on the link outcome were operationalized as bias, relative bias, and root mean square error of the estimated sample mean and variance of the latent variable. In the light of this limited context, concurrent calibration had substantial convergence issues, while the other methods resulted in an overall satisfying and similar parameter recovery-even in the presence of moderate model misfit. Our findings suggest that in case of model misfit, the share of anchor items should exceed 20% as is currently proposed in the literature. Future studies should further investigate the effects of anchor item composition regarding unbalanced model misfit.

Testing Students with Special Educational Needs in Large-Scale Assessments - Psychometric Properties of Test Scores and Associations with Test Taking Behavior.

Pohl, Steffi; Südkamp, Anna; Hardt, Katinka; Carstensen, Claus H; Weinert, Sabine.

Front Psychol ; 7: 154, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26941665

RESUMO

Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are-at the same time-measurement invariant to test scores of general education students. We investigated whether we can identify a subgroup of students with SEN-L, for which measurement invariant competence measures of adequate psychometric quality may be obtained with tests available in LSAs. We furthermore investigated whether differences in test-taking behavior may explain dissatisfying psychometric properties and measurement non-invariance of test scores within LSAs. We relied on person fit indices and mixture distribution models to identify students with SEN-L for whom test scores with satisfactory psychometric properties and measurement invariance may be obtained. We also captured differences in test-taking behavior related to guessing and missing responses. As a result we identified a subgroup of students with SEN-L for whom competence scores of adequate psychometric quality that are measurement invariant to those of general education students were obtained. Concerning test taking behavior, there was a small number of students who unsystematically picked response options. Removing these students from the sample slightly improved item fit. Furthermore, two different patterns of missing responses were identified that explain to some extent problems in the assessments of students with SEN-L.

Taking the Missing Propensity Into Account When Estimating Competence Scores: Evaluation of Item Response Theory Models for Nonignorable Omissions.

Köhler, Carmen; Pohl, Steffi; Carstensen, Claus H.

Educ Psychol Meas ; 75(5): 850-874, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-29795844

RESUMO

When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically made when using these models: (1) The missing propensity is unidimensional and (2) the missing propensity and the ability are bivariate normally distributed. These assumptions may, however, be violated in real data sets and could, thus, pose a threat to the validity of this approach. The present study focuses on modeling competencies in various domains, using data from a school sample (N = 15,396) and an adult sample (N = 7,256) from the National Educational Panel Study. Our interest was to investigate whether violations of unidimensionality and the normal distribution assumption severely affect the performance of the model-based approach in terms of differences in ability estimates. We propose a model with a competence dimension, a unidimensional missing propensity and a distributional assumption more flexible than a multivariate normal. Using this model for ability estimation results in different ability estimates compared with a model ignoring missing responses. Implications for ability estimation in large-scale assessments are discussed.

Reversed thresholds in partial credit models: a reason for collapsing categories?

Wetzel, Eunike; Carstensen, Claus H.

Assessment ; 21(6): 765-74, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-24789857

RESUMO

When questionnaire data with an ordered polytomous response format are analyzed in the framework of item response theory using the partial credit model or the generalized partial credit model, reversed thresholds may occur. This led to the discussion of whether reversed thresholds violate model assumptions and indicate disordering of the response categories. Adams, Wu, and Wilson showed that reversed thresholds are merely a consequence of low frequencies in the categories concerned and that they do not affect the order of the rating scale. This article applies an empirical approach to elucidate the topic of reversed thresholds using data from the Revised NEO Personality Inventory as well as a simulation study. It is shown that categories differentiate between participants with different trait levels despite reversed thresholds and that category disordering can be analyzed independently of the ordering of the thresholds. Furthermore, we show that reversed thresholds often only occur in subgroups of participants. Thus, researchers should think more carefully about collapsing categories due to reversed thresholds.

Assuntos

Modelos Psicológicos , Psicometria/métodos , Adulto , Humanos , Masculino , Personalidade

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa