Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
Br J Math Stat Psychol ; 77(1): 151-168, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37667833

RESUMEN

The use of joint models for item scores and response times is becoming increasingly popular in educational and psychological testing. In this paper, we propose two new person-fit statistics for such models in order to detect aberrant behaviour. The first statistic is computed by combining two existing person-fit statistics: one for the item scores, and one for the item response times. The second statistic is computed directly using the likelihood function of the joint model. Using detailed simulations, we show that the empirical null distributions of the new statistics are very close to the theoretical null distributions, and that the new statistics tend to be more powerful than several existing statistics for item scores and/or response times. A real data example is also provided using data from a licensure examination.


Asunto(s)
Modelos Estadísticos , Pruebas Psicológicas , Humanos , Tiempo de Reacción , Funciones de Verosimilitud
2.
Psychometrika ; 89(2): 569-591, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38558053

RESUMEN

Many popular person-fit statistics belong to the class of standardized person-fit statistics, T, and are assumed to have a standard normal null distribution. However, in practice, this assumption is incorrect since T is computed using (a) an estimated ability parameter and (b) a finite number of items. Snijders (Psychometrika 66(3):331-342, 2001) developed mean and variance corrections for T to account for the use of an estimated ability parameter. Bedrick (Psychometrika 62(2):191-199, 1997) and Molenaar and Hoijtink (Psychometrika 55(1):75-106, 1990) developed skewness corrections for T to account for the use of a finite number of items. In this paper, we combine these two lines of research and propose three new corrections for T that simultaneously account for the use of an estimated ability parameter and the use of a finite number of items. The new corrections are efficient in that they only require the analysis of the original data set and do not require the simulation or analysis of any additional data sets. We conducted a detailed simulation study and found that the new corrections are able to control the Type I error rate while also maintaining reasonable levels of power. A real data example is also included.


Asunto(s)
Psicometría , Humanos , Psicometría/métodos , Modelos Estadísticos , Simulación por Computador , Interpretación Estadística de Datos
3.
Artículo en Inglés | MEDLINE | ID: mdl-38379504

RESUMEN

Several new models based on item response theory have recently been suggested to analyse intensive longitudinal data. One of these new models is the time-varying dynamic partial credit model (TV-DPCM; Castro-Alvarez et al., Multivariate Behavioral Research, 2023, 1), which is a combination of the partial credit model and the time-varying autoregressive model. The model allows the study of the psychometric properties of the items and the modelling of nonlinear trends at the latent state level. However, there is a severe lack of tools to assess the fit of the TV-DPCM. In this paper, we propose and develop several test statistics and discrepancy measures based on the posterior predictive model checking (PPMC) method (PPMC; Rubin, The Annals of Statistics, 1984, 12, 1151) to assess the fit of the TV-DPCM. Simulated and empirical data are used to study the performance of and illustrate the effectiveness of the PPMC method.

4.
J Appl Meas ; 14(2): 149-58, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23816593

RESUMEN

The application of the existing test statistics to determine differential item functioning (DIF) requires large samples, but test administrators often face the challenge of detecting DIF with small samples. One advantage of a Bayesian approach over a frequentist approach is that the former can incorporate, in the form of a prior distribution, existing information on the inference problem at hand. Sinharay, Dorans, Grant, and Blew (2009) suggested the use of information from past data sets as a prior distribution in a Bayesian DIF analysis. This paper suggests an extension of the method of Sinharay et al. (2009). The suggested extension is compared to the existing DIF detection methods in a realistic simulation study.


Asunto(s)
Algoritmos , Teorema de Bayes , Interpretación Estadística de Datos , Modelos Estadísticos , Psicometría/métodos , Encuestas y Cuestionarios , Simulación por Computador , Tamaño de la Muestra
5.
Appl Psychol Meas ; 47(1): 76-82, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36425287

RESUMEN

In response to the closures of test centers worldwide due to the COVID-19 pandemic, several testing programs offered large-scale standardized assessments to examinees remotely. However, due to the varying quality of the performance of personal devices and internet connections, more at-home examinees likely suffered "disruptions" or an interruption in the connectivity to their testing session compared to typical test-center administrations. Disruptions have the potential to adversely affect examinees and lead to fairness or validity issues. The goal of this study was to investigate the extent to which disruptions impacted performance of at-home examinees using data from a large-scale admissions test. Specifically, the study involved comparing the average test scores of the disrupted examinees with those of the non-disrupted examinees after weighting the non-disrupted examinees to resemble the disrupted examinees along baseline characteristics. The results show that disruptions had a small negative impact on test scores on average. However, there was little difference in performance between the disrupted and non-disrupted examinees after removing records of the disrupted examinees who were unable to complete the test.

6.
Appl Psychol Meas ; 47(1): 3-18, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36425289

RESUMEN

The S-X 2 statistic (Orlando & Thissen, 2000) is popular among researchers and practitioners who are interested in the assessment of item fit. However, the statistic suffers from the Chernoff-Lehmann problem (Chernoff & Lehmann, 1954) and hence does not have a known asymptotic null distribution. This paper suggests a modified version of the S-X 2 statistic that is based on the modified Rao-Robson χ 2 statistic (Rao & Robson, 1974). A simulation study and a real data analyses demonstrate that the use of the modified statistic instead of the S-X 2 statistic would lead to fewer items being flagged for misfit.

7.
Appl Psychol Meas ; 47(2): 155-163, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36875293

RESUMEN

Targeted double scoring, or, double scoring of only some (but not all) responses, is used to reduce the burden of scoring performance tasks for several mastery tests (Finkelman, Darby, & Nering, 2008). An approach based on statistical decision theory (e.g., Berger, 1989; Ferguson, 1967; Rudner, 2009) is suggested to evaluate and potentially improve upon the existing strategies in targeted double scoring for mastery tests. An application of the approach to data from an operational mastery test shows that a refinement of the currently used strategy would lead to substantial cost savings.

8.
Educ Psychol Meas ; 82(3): 580-609, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35444341

RESUMEN

Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests. However, there is a lack of research on this estimation problem. The goal of this article is to suggest two new approaches-one each based on classical test theory and item response theory-for estimating the probabilities of passing of the examinees with incomplete data on mastery tests. The two approaches are demonstrated to have high accuracy and negligible misclassification rates.

9.
Appl Psychol Meas ; 46(1): 19-39, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34898745

RESUMEN

Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman-Pearson lemma (NPL; e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices (OAIs) of Levine and Drasgow (1988) and is the most powerful statistic for detecting item preknowledge when the assumptions underlying the statistic hold for the data (e.g., Belov, 2016Belov, 2016; Drasgow et al., 1996). This paper demonstrated using real data analysis that one assumption underlying the statistic of Drasgow et al. (1996) is often likely to be violated in practice. This paper also demonstrated, using simulated data, that the statistic is not robust to realistic violations of its underlying assumptions. Together, the results from the real data and the simulations demonstrate that the statistic of Drasgow et al. (1996) may not always be the optimum statistic in practice and occasionally has smaller power than another statistic for detecting preknowledge on a known set of items, especially when the assumptions underlying the former statistic do not hold. The findings of this paper demonstrate the importance of keeping in mind the assumptions underlying and the limitations of any statistic or method.

10.
Educ Psychol Meas ; 82(1): 177-200, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34992311

RESUMEN

Wollack et al. (2015) suggested the erasure detection index (EDI) for detecting fraudulent erasures for individual examinees. Wollack and Eckerly (2017) and Sinharay (2018) extended the index of Wollack et al. (2015) to suggest three EDIs for detecting fraudulent erasures at the aggregate or group level. This article follows up on the research of Wollack and Eckerly (2017) and Sinharay (2018) and suggests a new aggregate-level EDI by incorporating the empirical best linear unbiased predictor from the literature of linear mixed-effects models (e.g., McCulloch et al., 2008). A simulation study shows that the new EDI has larger power than the indices of Wollack and Eckerly (2017) and Sinharay (2018). In addition, the new index has satisfactory Type I error rates. A real data example is also included.

11.
Appl Psychol Meas ; 44(5): 376-392, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-32879537

RESUMEN

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. This article suggests a new statistic that can be used for detecting the examinees who may have benefited from item preknowledge using their response times. The statistic quantifies the difference in speed between the compromised items and the non-compromised items of the examinees. The distribution of the statistic under the null hypothesis of no preknowledge is proved to be the standard normal distribution. A simulation study is used to evaluate the Type I error rate and power of the suggested statistic. A real data example demonstrates the usefulness of the new statistic that is found to provide information that is not provided by statistics based only on item scores.

12.
Br J Math Stat Psychol ; 73(3): 397-419, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-31418458

RESUMEN

According to Wollack and Schoenig (2018, The Sage encyclopedia of educational research, measurement, and evaluation. Thousand Oaks, CA: Sage, 260), benefiting from item preknowledge is one of the three broad types of test fraud that occur in educational assessments. We use tools from constrained statistical inference to suggest a new statistic that is based on item scores and response times and can be used to detect examinees who may have benefited from item preknowledge for the case when the set of compromised items is known. The asymptotic distribution of the new statistic under no preknowledge is proved to be a simple mixture of two χ2 distributions. We perform a detailed simulation study to show that the Type I error rate of the new statistic is very close to the nominal level and that the power of the new statistic is satisfactory in comparison to that of the existing statistics for detecting item preknowledge based on both item scores and response times. We also include a real data example to demonstrate the usefulness of the suggested statistic.


Asunto(s)
Evaluación Educacional/estadística & datos numéricos , Modelos Estadísticos , Algoritmos , Simulación por Computador , Fraude/estadística & datos numéricos , Humanos , Funciones de Verosimilitud , Tiempo de Reacción
13.
Br J Math Stat Psychol ; 62(Pt 1): 79-95, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-17937841

RESUMEN

Recently, there has been an increasing level of interest in reporting subscores for components of larger assessments. This paper examines the issue of reporting subscores at an aggregate level, especially at the level of institutions to which the examinees belong. A new statistical approach based on classical test theory is proposed to assess when subscores at the institutional level have any added value over the total scores. The methods are applied to two operational data sets. For the data under study, the observed results provide little support in favour of reporting subscores for either examinees or institutions.


Asunto(s)
Evaluación Educacional/estadística & datos numéricos , Pruebas Psicológicas/estadística & datos numéricos , Psicometría/estadística & datos numéricos , Proyectos de Investigación/estadística & datos numéricos , Certificación , Docentes , Humanos , Teoría Psicológica , Reproducibilidad de los Resultados , Estadística como Asunto
15.
Psychometrika ; 84(2): 484-510, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-29951971

RESUMEN

In educational and psychological measurement, researchers and/or practitioners are often interested in examining whether the ability of an examinee is the same over two sets of items. Such problems can arise in measurement of change, detection of cheating on unproctored tests, erasure analysis, detection of item preknowledge, etc. Traditional frequentist approaches that are used in such problems include the Wald test, the likelihood ratio test, and the score test (e.g., Fischer, Appl Psychol Meas 27:3-26, 2003; Finkelman, Weiss, & Kim-Kang, Appl Psychol Meas 34:238-254, 2010; Glas & Dagohoy, Psychometrika 72:159-180, 2007; Guo & Drasgow, Int J Sel Assess 18:351-364, 2010; Klauer & Rettig, Br J Math Stat Psychol 43:193-206, 1990; Sinharay, J Educ Behav Stat 42:46-68, 2017). This paper shows that approaches based on higher-order asymptotics (e.g., Barndorff-Nielsen & Cox, Inference and asymptotics. Springer, London, 1994; Ghosh, Higher order asymptotics. Institute of Mathematical Statistics, Hayward, 1994) can also be used to test for the equality of the examinee ability over two sets of items. The modified signed likelihood ratio test (e.g., Barndorff-Nielsen, Biometrika 73:307-322, 1986) and the Lugannani-Rice approximation (Lugannani & Rice, Adv Appl Prob 12:475-490, 1980), both of which are based on higher-order asymptotics, are shown to provide some improvement over the traditional frequentist approaches in three simulations. Two real data examples are also provided.


Asunto(s)
Decepción , Evaluación Educacional , Modelos Estadísticos , Humanos , Psicometría
16.
Br J Math Stat Psychol ; 71(2): 363-386, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29315495

RESUMEN

Tatsuoka suggested several extended caution indices and their standardized versions, and these have been used as person-fit statistics by various researchers. However, these indices are only defined for tests with dichotomous items. This paper extends two of the popular standardized extended caution indices for use with polytomous items and mixed-format tests. Two additional new person-fit statistics are obtained by applying the asymptotic standardization of person-fit statistics for mixed-format tests. Detailed simulations are then performed to compute the Type I error rate and power of the four new person-fit statistics. Two real data illustrations follow. The new person-fit statistics appear to be satisfactory tools for assessing person fit for polytomous items and mixed-format tests.


Asunto(s)
Psicometría/métodos , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Modelos Estadísticos , Probabilidad , Psicometría/estadística & datos numéricos , Reproducibilidad de los Resultados , Estadística como Asunto
17.
Appl Psychol Meas ; 41(2): 145-149, 2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-29881083

RESUMEN

Levine and Drasgow (1988) suggested an approach based on the Neyman-Pearson lemma to detect examinees whose response patterns are "aberrant" due to cheating, language issues, and so on. Belov (2016) used the approach of Levine and Drasgow (1988) to suggest a statistic based on the Neyman-Pearson Lemma (SBNPL) to detect item preknowledge when the investigator knows which items are compromised. This brief report proves that the SBNPL of Belov (2016) is equivalent to a statistic suggested for the same purpose by Drasgow, Levine, and Zickar 20 years ago.

18.
Appl Psychol Meas ; 41(6): 403-421, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-29881099

RESUMEN

Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

19.
Psychometrika ; 82(4): 1149-1161, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-27770307

RESUMEN

Tests for a change point (e.g., Chen and Gupta, Parametric statistical change point analysis (2nd ed.). Birkhuser, Boston, 2012; Hawkins et al., J Qual Technol 35:355-366, 2003) have recently been brought into the spotlight for their potential uses in psychometrics. They have been successfully applied to detect an unusual change in the mean score of a sequence of administrations of an international language assessment (Lee and von Davier, Psychometrika 78:557-575, 2013) and to detect speededness of examinees (Shao et al., Psychometrika, 2015). The differences in the type of data used, the test statistics, and the manner in which the critical values were obtained in these papers lead to questions such as "what type of psychometric problems can be solved by tests for a change point?" and "what test statistics should be used with tests for a change point in psychometric problems?" This note attempts to answer some of these questions by providing a general overview of tests for a change point with a focus on application to psychometric problems. A discussion is provided on the choice of an appropriate test statistic and on the computation of a corresponding critical value for tests for a change point. Then, three real data examples are provided to demonstrate how tests for a change point can be used to make important inferences in psychometric problems. The examples include some clarifications and remarks on the critical values used in Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015). The overview and the examples provide insight on tests for a change point above and beyond Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015). Thus, this note extends the research of Lee and von Davier (Psychometrika, 78:557-575, 2013) and Shao et al. (Psychometrika, 2015) on tests for a change point.


Asunto(s)
Interpretación Estadística de Datos , Psicometría/métodos , Evaluación Educacional , Humanos
20.
Psychometrika ; 82(4): 979-1006, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28852944

RESUMEN

Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.


Asunto(s)
Teorema de Bayes , Modelos Estadísticos , Análisis Multivariante , Algoritmos , Simulación por Computador , Evaluación Educacional/métodos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA