Pesquisa | Portal Regional da BVS

1.

The interactive reading task: Transformer-based automatic item generation.

Attali, Yigal; Runge, Andrew; LaFlair, Geoffrey T; Yancey, Kevin; Goodwin, Sarah; Park, Yena; von Davier, Alina A.

Front Artif Intell ; 5: 903077, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35937141

RESUMO

Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a transformer-based deep language modeling approach for creating reading comprehension assessments. This approach allows a fully automated process for the creation of source passages together with a wide range of comprehension questions about the passages. The format of the questions allows automatic scoring of responses with high fidelity (e.g., selected response questions). We present the results of a large-scale pilot of the interactive reading task, with hundreds of passages and thousands of questions. These passages were administered as part of the practice test of the Duolingo English Test. Human review of the materials and psychometric analyses of test taker results demonstrate the feasibility of this approach for automatic creation of complex educational assessments.

2.

The Expanded Evidence-Centered Design (e-ECD) for Learning and Assessment Systems: A Framework for Incorporating Learning Goals and Processes Within Assessment Design.

Arieli-Attali, Meirav; Ward, Sue; Thomas, Jay; Deonovic, Benjamin; von Davier, Alina A.

Front Psychol ; 10: 853, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31105616

RESUMO

Evidence-centered design (ECD) is a framework for the design and development of assessments that ensures consideration and collection of validity evidence from the onset of the test design. Blending learning and assessment requires integrating aspects of learning at the same level of rigor as aspects of testing. In this paper, we describe an expansion to the ECD framework (termed e-ECD) such that it includes the specifications of the relevant aspects of learning at each of the three core models in the ECD, as well as making room for specifying the relationship between learning and assessment within the system. The framework proposed here does not assume a specific learning theory or particular learning goals, rather it allows for their inclusion within an assessment framework, such that they can be articulated by researchers or assessment developers that wish to focus on learning.

3.

Estimating the DINA model parameters using the No-U-Turn Sampler.

da Silva, Marcelo A; de Oliveira, Eduardo S B; von Davier, Alina A; Bazán, Jorge L.

Biom J ; 60(2): 352-368, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-29194715

RESUMO

The deterministic inputs, noisy, "and" gate (DINA) model is a popular cognitive diagnosis model (CDM) in psychology and psychometrics used to identify test takers' profiles with respect to a set of latent attributes or skills. In this work, we propose an estimation method for the DINA model with the No-U-Turn Sampler (NUTS) algorithm, an extension to Hamiltonian Monte Carlo (HMC) method. We conduct a simulation study in order to evaluate the parameter recovery and efficiency of this new Markov chain Monte Carlo method and to compare it with two other Bayesian methods, the Metropolis Hastings and Gibbs sampling algorithms, and with a frequentist method, using the Expectation-Maximization (EM) algorithm. The results indicated that NUTS algorithm employed in the DINA model properly recovers all parameters and is accurate for all simulated scenarios. We apply this methodology in the mental health area in order to develop a new method of classification for respondents to the Beck Depression Inventory. The implementation of this method for the DINA model applied to other psychological tests has the potential to improve the medical diagnostic process.

Assuntos

Biometria/métodos , Cognição , Modelos Estatísticos , Psicometria , Algoritmos , Depressão/fisiopatologia , Depressão/psicologia , Humanos , Método de Monte Carlo

4.

Computational Psychometrics for the Measurement of Collaborative Problem Solving Skills.

Polyak, Stephen T; von Davier, Alina A; Peterschmidt, Kurt.

Front Psychol ; 8: 2029, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29238314

RESUMO

This paper describes a psychometrically-based approach to the measurement of collaborative problem solving skills, by mining and classifying behavioral data both in real-time and in post-game analyses. The data were collected from a sample of middle school children who interacted with a game-like, online simulation of collaborative problem solving tasks. In this simulation, a user is required to collaborate with a virtual agent to solve a series of tasks within a first-person maze environment. The tasks were developed following the psychometric principles of Evidence Centered Design (ECD) and are aligned with the Holistic Framework developed by ACT. The analyses presented in this paper are an application of an emerging discipline called computational psychometrics which is growing out of traditional psychometrics and incorporates techniques from educational data mining, machine learning and other computer/cognitive science fields. In the real-time analysis, our aim was to start with limited knowledge of skill mastery, and then demonstrate a form of continuous Bayesian evidence tracing that updates sub-skill level probabilities as new conversation flow event evidence is presented. This is performed using Bayes' rule and conversation item conditional probability tables. The items are polytomous and each response option has been tagged with a skill at a performance level. In our post-game analysis, our goal was to discover unique gameplay profiles by performing a cluster analysis of user's sub-skill performance scores based on their patterns of selected dialog responses.

5.

A Note on the Poisson's Binomial Distribution in Item Response Theory.

González, Jorge; Wiberg, Marie; von Davier, Alina A.

Appl Psychol Meas ; 40(4): 302-310, 2016 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-29881055

RESUMO

The Poisson's binomial (PB) is the probability distribution of the number of successes in independent but not necessarily identically distributed binary trials. The independent non-identically distributed case emerges naturally in the field of item response theory, where answers to a set of binary items are conditionally independent given the level of ability, but with different probabilities of success. In many applications, the number of successes represents the score obtained by individuals, and the compound binomial (CB) distribution has been used to obtain score probabilities. It is shown here that the PB and the CB distributions lead to equivalent probabilities. Furthermore, one of the proposed algorithms to calculate the PB probabilities coincides exactly with the well-known Lord and Wingersky (LW) algorithm for CBs. Surprisingly, we could not find any reference in the psychometric literature pointing to this equivalence. In a simulation study, different methods to calculate the PB distribution are compared with the LW algorithm. Providing an exact alternative to the traditional LW approximation for obtaining score distributions is a contribution to the field.

6.

Examining Potential Boundary Bias Effects in Kernel Smoothing on Equating: An Introduction for the Adaptive and Epanechnikov Kernels.

Cid, Jaime A; von Davier, Alina A.

Appl Psychol Meas ; 39(3): 208-222, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-29881003

RESUMO

Test equating is a method of making the test scores from different test forms of the same assessment comparable. In the equating process, an important step involves continuizing the discrete score distributions. In traditional observed-score equating, this step is achieved using linear interpolation (or an unscaled uniform kernel). In the kernel equating (KE) process, this continuization process involves Gaussian kernel smoothing. It has been suggested that the choice of bandwidth in kernel smoothing controls the trade-off between variance and bias. In the literature on estimating density functions using kernels, it has also been suggested that the weight of the kernel depends on the sample size, and therefore, the resulting continuous distribution exhibits bias at the endpoints, where the samples are usually smaller. The purpose of this article is (a) to explore the potential effects of atypical scores (spikes) at the extreme ends (high and low) on the KE method in distributions with different degrees of asymmetry using the randomly equivalent groups equating design (Study I), and (b) to introduce the Epanechnikov and adaptive kernels as potential alternative approaches to reducing boundary bias in smoothing (Study II). The beta-binomial model is used to simulate observed scores reflecting a range of different skewed shapes.

7.

Observed-score equating: an overview.

von Davier, Alina A.

Psychometrika ; 78(4): 605-23, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-24092480

RESUMO

In this paper, an overview of the observed-score equating (OSE) process is provided from the perspective of a unifying equating framework (von Davier in von Davier (Ed.), Statistical models for test equating, scaling, and linking, Springer, New York, pp. 1-17, 2011b). The framework includes all OSE approaches. Issues related to the test, common items, and sampling designs and their relationship to measurement and equating are discussed. Challenges to the equating process, model assumptions, and approaches to equating evaluation are also presented. The equating process is illustrated step-by-step with a real data example from a licensure test.

Assuntos

Modelos Estatísticos , Psicometria/métodos , Humanos

8.

Monitoring scale scores over time via quality control charts, model-based approaches, and time series techniques.

Lee, Yi-Hsuan; von Davier, Alina A.

Psychometrika ; 78(3): 557-75, 2013 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25106404

RESUMO

Maintaining a stable score scale over time is critical for all standardized educational assessments. Traditional quality control tools and approaches for assessing scale drift either require special equating designs, or may be too time-consuming to be considered on a regular basis with an operational test that has a short time window between an administration and its score reporting. Thus, the traditional methods are not sufficient to catch unusual testing outcomes in a timely manner. This paper presents a new approach for score monitoring and assessment of scale drift. It involves quality control charts, model-based approaches, and time series techniques to accommodate the following needs of monitoring scale scores: continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. Performance of the methodologies is evaluated using manipulated data based on real responses from 71 administrations of a large-scale high-stakes language assessment.

Assuntos

Psicometria/métodos , Psicometria/normas , Projetos de Pesquisa/normas , Avaliação Educacional/normas , Humanos , Manutenção/métodos , Modelos Estatísticos , Controle de Qualidade , Análise de Regressão , Estações do Ano

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA