Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Multivariate Behav Res ; 55(3): 425-453, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31448968

RESUMO

For adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuable information on the processes associated with nonresponse. Bringing together research on item omissions with approaches for modeling response time data, we propose a framework for simultaneously modeling response behavior and omission behavior utilizing timing information for both. As such, the proposed model allows (a) to gain a deeper understanding of response and nonresponse behavior in general and, in particular, of the processes underlying item omissions in LSAs, (b) to model the processes determining the time examinees require to generate a response or to omit an item, and (c) to account for nonignorable item omissions. Parameter recovery of the proposed model is studied within a simulation study. An illustration of the model by means of an application to real data is provided.


Assuntos
Algoritmos , Simulação por Computador , Modelos Estatísticos , Tempo de Reação/fisiologia , Interpretação Estatística de Dados , Humanos
2.
Multivariate Behav Res ; 49(2): 161-77, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-26741175

RESUMO

This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.

3.
Educ Psychol Meas ; 83(4): 740-765, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37398841

RESUMO

Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models for the assessment of item fit. The work presented here provides a robust approach for DIF detection that does not assume perfect model data fit, but rather uses Tukey's concept of contaminated distributions. The approach uses robust outlier detection to flag items for which adequate model data fit cannot be established.

4.
Educ Psychol Meas ; 83(3): 556-585, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37187689

RESUMO

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our results show that convolutional neural networks (CNNs) outperform feed-forward neural networks in both loss and accuracy. The CNN models classified up to 97.53% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human-rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for international large-scale assessments (ILSAs), while improving the validity and comparability of scoring complex constructed-response items.

5.
Psychometrika ; 87(2): 593-619, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34855118

RESUMO

Careless and insufficient effort responding (C/IER) can pose a major threat to data quality and, as such, to validity of inferences drawn from questionnaire data. A rich body of methods aiming at its detection has been developed. Most of these methods can detect only specific types of C/IER patterns. However, typically different types of C/IER patterns occur within one data set and need to be accounted for. We present a model-based approach for detecting manifold manifestations of C/IER at once. This is achieved by leveraging response time (RT) information available from computer-administered questionnaires and integrating theoretical considerations on C/IER with recent psychometric modeling approaches. The approach a) takes the specifics of attentive response behavior on questionnaires into account by incorporating the distance-difficulty hypothesis, b) allows for attentiveness to vary on the screen-by-respondent level, c) allows for respondents with different trait and speed levels to differ in their attentiveness, and d) at once deals with various response patterns arising from C/IER. The approach makes use of item-level RTs. An adapted version for aggregated RTs is presented that supports screening for C/IER behavior on the respondent level. Parameter recovery is investigated in a simulation study. The approach is illustrated in an empirical example, comparing different RT measures and contrasting the proposed model-based procedure against indicator-based multiple-hurdle approaches.


Assuntos
Psicometria , Simulação por Computador , Psicometria/métodos , Tempo de Reação , Autorrelato , Inquéritos e Questionários
6.
Educ Psychol Meas ; 81(2): 363-387, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37929265

RESUMO

This article presents a new approach to the analysis of how students answer tests and how they allocate resources in terms of time on task and revisiting previously answered questions. Previous research has shown that in high-stakes assessments, most test takers do not end the testing session early, but rather spend all of the time they were assigned to take the test. Rather than being an indication of speededness, this was found to be caused by test takers' tendency to revisit previous items even if they already provided answers to all questions. In accordance with this information, the proposed approach models revisit patterns simultaneously with responses and response times to gain a better understanding of the relationship between speed, ability, and revisit tendency. The empirical data analysis revealed that examinees' tendency to revisit items was strongly related to their speed and subgroups of examinees displayed different test-taking behaviors.

7.
Br J Math Stat Psychol ; 74 Suppl 1: 157-175, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-33332585

RESUMO

When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).


Assuntos
Projetos de Pesquisa , Simulação por Computador , Tamanho da Amostra
8.
Br J Math Stat Psychol ; 73 Suppl 1: 83-112, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-31709521

RESUMO

In low-stakes assessments, test performance has few or no consequences for examinees themselves, so that examinees may not be fully engaged when answering the items. Instead of engaging in solution behaviour, disengaged examinees might randomly guess or generate no response at all. When ignored, examinee disengagement poses a severe threat to the validity of results obtained from low-stakes assessments. Statistical modelling approaches in educational measurement have been proposed that account for non-response or for guessing, but do not consider both types of disengaged behaviour simultaneously. We bring together research on modelling examinee engagement and research on missing values and present a hierarchical latent response model for identifying and modelling the processes associated with examinee disengagement jointly with the processes associated with engaged responses. To that end, we employ a mixture model that identifies disengagement at the item-by-examinee level by assuming different data-generating processes underlying item responses and omissions, respectively, as well as response times associated with engaged and disengaged behaviour. By modelling examinee engagement with a latent response framework, the model allows assessing how examinee engagement relates to ability and speed as well as to identify items that are likely to evoke disengaged test-taking behaviour. An illustration of the model by means of an application to real data is presented.


Assuntos
Avaliação Educacional/estatística & dados numéricos , Modelos Psicológicos , Modelos Estatísticos , Habilidades para Realização de Testes/psicologia , Habilidades para Realização de Testes/estatística & dados numéricos , Teorema de Bayes , Comportamento de Escolha , Simulação por Computador , Interpretação Estatística de Dados , Tomada de Decisões , Humanos , Cadeias de Markov , Método de Monte Carlo , Motivação , Tempo de Reação
9.
Educ Psychol Meas ; 80(3): 522-547, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32425218

RESUMO

So far, modeling approaches for not-reached items have considered one single underlying process. However, missing values at the end of a test can occur for a variety of reasons. On the one hand, examinees may not reach the end of a test due to time limits and lack of working speed. On the other hand, examinees may not attempt all items and quit responding due to, for example, fatigue or lack of motivation. We use response times retrieved from computerized testing to distinguish missing data due to lack of speed from missingness due to quitting. On the basis of this information, we present a new model that allows to disentangle and simultaneously model different missing data mechanisms underlying not-reached items. The model (a) supports a more fine-grained understanding of the processes underlying not-reached items and (b) allows to disentangle different sources describing test performance. In a simulation study, we evaluate estimation of the proposed model. In an empirical study, we show what insights can be gained regarding test-taking behavior using this model.

10.
Br J Math Stat Psychol ; 72(3): 538-559, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31385610

RESUMO

Personality constructs, attitudes and other non-cognitive variables are often measured using rating or Likert-type scales, which does not come without problems. Especially in low-stakes assessments, respondents may produce biased responses due to response styles (RS) that reduce the validity and comparability of the measurement. Detecting and correcting RS is not always straightforward because not all respondents show RS and the ones who do may not do so to the same extent or in the same direction. The present study proposes the combination of a multidimensional IRTree model with a mixture distribution item response theory model and illustrates the application of the approach using data from the Programme for the International Assessment of Adult Competencies (PIAAC). This joint approach allows for the differentiation between different latent classes of respondents who show different RS behaviours and respondents who show RS versus respondents who give (largely) unbiased responses. We illustrate the application of the approach by examining extreme RS and show how the resulting latent classes can be further examined using external variables and process data from computer-based assessments to develop a better understanding of response behaviour and RS.


Assuntos
Viés , Autorrelato , Atitude , Humanos , Modelos Estatísticos , Determinação da Personalidade , Psicometria
11.
Front Psychol ; 10: 2461, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31824363

RESUMO

The Programme for International Student Assessment (PISA) introduced the measurement of problem-solving skills in the 2012 cycle. The items in this new domain employ scenario-based environments in terms of students interacting with computers. Process data collected from log files are a record of students' interactions with the testing platform. This study suggests a two-stage approach for generating features from process data and selecting the features that predict students' responses using a released problem-solving item-the Climate Control Task. The primary objectives of the study are (1) introducing an approach for generating features from the process data and using them to predict the response to this item, and (2) finding out which features have the most predictive value. To achieve these goals, a tree-based ensemble method, the random forest algorithm, is used to explore the association between response data and predictive features. Also, features can be ranked by importance in terms of predictive performance. This study can be considered as providing an alternative way to analyze process data having a pedagogical purpose.

12.
Psychometrika ; 84(1): 147-163, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30607661

RESUMO

This paper provides results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty. The presentation of items is adaptive in the sense that a session is discontinued once a test taker produces a certain number of incorrect responses in sequence, with subsequent (not observed) responses commonly scored as wrong. The Stanford-Binet Intelligence Scales (SB5; Riverside Publishing Company, 2003) and the Kaufman Assessment Battery for Children (KABC-II; Kaufman and Kaufman, 2004), the Kaufman Adolescent and Adult Intelligence Test (Kaufman and Kaufman 2014) and the Universal Nonverbal Intelligence Test (2nd ed.) (Bracken and McCallum 2015) are some of the many examples using this rule. He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) compared different ability estimation methods in a simulation study for this discontinue rule adaptation of test length. However, there has been no study, to our knowledge, of the underlying distributional properties based on analytic arguments drawing on probability theory, of what these authors call stochastic censoring of responses. The study results obtained by He and Wolfe (Educ Psychol Meas 72(5):808-826, 2012. https://doi.org/10.1177/0013164412441937 ) agree with results presented by DeAyala et al. (J Educ Meas 38:213-234, 2001) as well as Rose et al. (Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11), Educational Testing Service, Princeton, 2010) and Rose et al. (Psychometrika 82:795-819, 2017. https://doi.org/10.1007/s11336-016-9544-7 ) in that ability estimates are biased most when scoring the not observed responses as wrong. This scoring is used operationally, so more research is needed in order to improve practice in this field. The paper extends existing research on adaptivity by discontinue rules in intelligence tests in multiple ways: First, an analytical study of the distributional properties of discontinue rule scored items is presented. Second, a simulation is presented that includes additional scoring rules and uses ability estimators that may be suitable to reduce bias for discontinue rule scored intelligence tests.


Assuntos
Psicometria/métodos , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Inteligência , Testes de Inteligência
13.
Psychometrika ; 84(3): 892-920, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31054065

RESUMO

Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers' working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with research on modeling missing responses, we propose using response times to model missing values due to time limits. We identify similarities between approaches used to account for not-reached items (Rose et al. in ETS Res Rep Ser 2010:i-53, 2010) and the speed-accuracy (SA) model for joint modeling of effective speed and effective ability as proposed by van der Linden (Psychometrika 72(3):287-308, 2007). In a simulation, we show (a) that the SA model can recover parameters in the presence of missing values due to time limits and (b) that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not-reached items. We propose using the SA model to model the missing data process and to use both, ability and speed, to describe the performance of test takers. We illustrate the application of the model in an empirical analysis.


Assuntos
Simulação por Computador/estatística & dados numéricos , Psicometria/métodos , Tempo de Reação/fisiologia , Algoritmos , Teorema de Bayes , Simulação por Computador/tendências , Computadores/normas , Humanos , Modelos Teóricos , Análise e Desempenho de Tarefas , Fatores de Tempo
14.
Br J Math Stat Psychol ; 61(Pt 2): 287-307, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17535481

RESUMO

Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.


Assuntos
Testes de Linguagem , Linguística/estatística & dados numéricos , Modelos Psicológicos , Humanos , Tempo de Reação
15.
Psychometrika ; 83(4): 847-857, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29532403

RESUMO

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven's progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Automação , Humanos , Idioma , Modelos Teóricos , Personalidade , Testes de Personalidade , Psicometria/métodos
16.
Appl Psychol Meas ; 42(4): 291-306, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29881126

RESUMO

The research presented in this article combines mathematical derivations and empirical results to investigate effects of the nonparametric anchoring vignette approach proposed by King, Murray, Salomon, and Tandon on the reliability and validity of rating data. The anchoring vignette approach aims to correct rating data for response styles to improve comparability across individuals and groups. Vignettes are used to adjust self-assessment responses on the respondent level but entail significant assumptions: They are supposed to be invariant across respondents, and the responses to vignette prompts are supposed to be without error and strictly ordered. This article shows that these assumptions are not always met and that the anchoring vignette approach leads to higher Cronbach's alpha values and increased correlations among adjusted variables regardless of whether the assumptions of the approach are met or violated. Results suggest that the underlying assumptions and effects of the anchoring vignette approach should be carefully examined as the increased correlations and reliability estimates can be observed even for response variables that are independent random draws and uncorrelated with any other variable.

18.
Science ; 372(6540): 338-340, 2021 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-33888624
19.
Psychometrika ; 2016 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-27848151

RESUMO

Item nonresponse is a common problem in educational and psychological assessments. The probability of unplanned missing responses due to omitted and not-reached items may stochastically depend on unobserved variables such as missing responses or latent variables. In such cases, missingness cannot be ignored and needs to be considered in the model. Specifically, multidimensional IRT models, latent regression models, and multiple-group IRT models have been suggested for handling nonignorable missing responses in latent trait models. However, the suitability of the particular models with respect to omitted and not-reached items has rarely been addressed. Missingness is formalized by response indicators that are modeled jointly with the researcher's target model. We will demonstrate that response indicators have different statistical properties depending on whether the items were omitted or not reached. The implications of these differences are used to derive a joint model for nonignorable missing responses with ability to appropriately account for both omitted and not-reached items. The performance of the model is demonstrated by means of a small simulation study.

20.
Educ Psychol Meas ; 75(5): 739-763, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29795839

RESUMO

In large-scale educational surveys, a latent regression model is used to compensate for the shortage of cognitive information. Conventionally, the covariates in the latent regression model are principal components extracted from background data. This operational method has several important disadvantages, such as the handling of missing data and the high model complexity. The approach introduced here to identify multiple groups that can account for the variation among students is to conduct a latent class analysis (LCA). In the LCA, one or more latent nominal variables are identified that can be used to classify respondents with respect to their background characteristics. These classifications are then introduced as predictors in the latent regression. The primary goal of this study was to explore whether this approach yields similar estimates of group means and standard deviations compared with the operational procedure. The alternative approaches based on LCA differed regarding the number of classes, the items used for the LCA, and whether manifest class membership information or class membership probabilities were used as independent variables in the latent regression. Overall, recovery of the operational approach's group means and standard deviations was very satisfactory for all LCA approaches. Furthermore, the posterior means and standard deviations used to generate plausible values derived from the operational approach and the LCA approaches correlated highly. Thus, incorporating independent variables based on an LCA of background data into the latent regression model appears to be a viable alternative to the operational approach.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa