Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Behav Res Methods ; 55(8): 3965-3983, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36333627

RESUMO

Hierarchical Bayesian modeling is beneficial when complex models with many parameters of the same type, such as item response theory (IRT) models, are to be estimated with sparse data. Recently, Koenig et al. (Applied Psychological Measurement, 44, 311-326, 2020) illustrated in an optimized hierarchical Bayesian two-parameter logistic model (OH2PL) how to avoid bias due to unintended shrinkage or degeneracies of the posterior, and how to benefit from this approach in small samples. The generalizability of their findings, however, is limited because they investigated only a single specification of the hyperprior structure. Consequently, in a comprehensive simulation study, we investigated the robustness of the performance of the novel OH2PL in several specifications of their hyperpriors under a broad range of data conditions. We show that the novel OH2PL in the half-Cauchy or Exponential configuration yields unbiased (in terms of bias) model parameter estimates in small samples of N = 50. Moreover, it outperforms (especially in terms of the RMSE of the item discrimination parameters) marginal maximum likelihood (MML) estimation and its nonhierarchical counterpart. This further corroborates the possibility that hierarchical Bayesian IRT models behave differently than general hierarchical Bayesian models. We discuss these results regarding the applicability of complex IRT models in small-scale situations typical in psychological research, and illustrate the extended applicability of the 2PL IRT model with an empirical example.


Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Calibragem , Psicometria/métodos , Simulação por Computador
2.
Educ Inf Technol (Dordr) ; 28(6): 6485-6513, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36415780

RESUMO

The replacement of existing technology or the introduction of novel technology into the day-to-day routines of higher education institutions is not a trivial task. Currently, many higher education institutions are faced with the challenge of replacing existing procedures for administering written exams with e-exams. To guide this process, this paper proposes the novel technology-based exams acceptance model (TEAM) and empirically evaluates its model structure and usefulness from the perspective of higher education teachers. The model can be used to guide the transition from paper-based exams to e-exams and the implementation of innovative (e.g., adaptive) e-exam formats. The model includes perceived usefulness, computer self-efficacy, computer anxiety, prior experience, facilitating conditions, and subjective norm as predictors of the behavioral intention to use e-exams. To test the model empirically, the responses of 992 teachers at 63 German universities to a standardized online questionnaire were analyzed using structural equation modeling. The model fit was acceptable. With 77% (conventional e-exams) and 82% (adaptive e-exams), a large proportion of the variance of the intention to use these types of exams was explained. With TEAM, a highly predictive model for explaining the behavioral intention to use e-exams is now available. It offers a theoretical basis that can be used for the successful implementation of e-exams in higher education.

3.
J Appl Meas ; 15(3): 276-91, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24992251

RESUMO

Testing hypotheses on a respondent's individual fit under the Rasch model requires knowledge of the distributional properties of a person fit statistic. We argue that the Rasch Sampler (Verhelst, 2008), a Markov chain Monte Carlo algorithm for sampling binary data matrices from a uniform distribution, can be applied for simulating the distribution of person fit statistics with the Rasch model in the same way as it used to test for other forms of misfit. Results from two simulation studies are presented which compare the approach to the original person fit statistics based on normalization formulas. Simulation 1 shows the new approach to hold the expected Type I error rates while the normalized statistics deviate from the nominal alpha-level. In Simulation 2 the power of the new approach was found to be approximately the same or higher than for the normalized statistics under most conditions.


Assuntos
Viés , Modelos Estatísticos , Testes Psicológicos/estatística & dados numéricos , Psicometria/estatística & dados numéricos , Algoritmos , Enganação , Humanos , Cadeias de Markov , Computação Matemática , Método de Monte Carlo , Probabilidade , Distribuições Estatísticas
4.
Appl Psychol Meas ; 44(4): 311-326, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32536732

RESUMO

Accurate item calibration in models of item response theory (IRT) requires rather large samples. For instance, N > 500 respondents are typically recommended for the two-parameter logistic (2PL) model. Hence, this model is considered a large-scale application, and its use in small-sample contexts is limited. Hierarchical Bayesian approaches are frequently proposed to reduce the sample size requirements of the 2PL. This study compared the small-sample performance of an optimized Bayesian hierarchical 2PL (H2PL) model to its standard inverse Wishart specification, its nonhierarchical counterpart, and both unweighted and weighted least squares estimators (ULSMV and WLSMV) in terms of sampling efficiency and accuracy of estimation of the item parameters and their variance components. To alleviate shortcomings of hierarchical models, the optimized H2PL (a) was reparametrized to simplify the sampling process, (b) a strategy was used to separate item parameter covariances and their variance components, and (c) the variance components were given Cauchy and exponential hyperprior distributions. Results show that when combining these elements in the optimized H2PL, accurate item parameter estimates and trait scores are obtained even in sample sizes as small as N = 100 . This indicates that the 2PL can also be applied to smaller sample sizes encountered in practice. The results of this study are discussed in the context of a recently proposed multiple imputation method to account for item calibration error in trait estimation.

5.
Front Psychol ; 11: 1116, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32547462

RESUMO

Receptive skills in English as a second language are important for students on the verge of entering higher education as this student group (aged 17-19) is expected to apply English regularly in their later life. Previous research in this age group in Germany already implied an increasing overlap between both skills in this age group, although robustness of this effect across student groups with different learning experiences was not tested. We used language assessment data collected from upper secondary schools (i.e., from 17 to 19-year-old students) in Germany to compare correlations at the beginning and the end of upper secondary education in groups of students from (1) language-related versus non-language-related study profiles and (2) from students with frequent versus less frequent self-reported English-language out-of-school learning activities. In all of these groups, correlations were increasing, indicating converging skills in upper secondary education. The results are discussed in terms of implications for current theories of language research.

6.
Front Psychol ; 10: 1277, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31244717

RESUMO

The increasing digitalization in the field of psychological and educational testing opens up new opportunities to innovate assessments in many respects (e.g., new item formats, flexible test assembly, efficient data handling). In particular, computerized adaptive testing provides the opportunity to make tests more individualized and more efficient. The newly developed continuous calibration strategy (CCS) from Fink et al. (2018) makes it possible to construct computerized adaptive tests in application areas where separate calibration studies are not feasible. Due to the goal of reporting on a common metric across test cycles, the equating is crucial for the CCS. The quality of the equating depends on the common items selected and the scale transformation method applied. Given the novelty of the CCS, the aim of the study was to evaluate different equating setups in the CCS and to derive practical recommendations. The impact of different equating setups on the precision of item parameter estimates and on the quality of the equating was examined in a Monte Carlo simulation, based on a fully crossed design with the factors common item difficulty distribution (bimodal, normal, uniform), scale transformation method (mean/mean, mean/sigma, Haebara, Stocking-Lord), and sample size per test cycle (50, 100, 300). The quality of the equating was operationalized by three criteria (proportion of feasible equatings, proportion of drifted items, and error of transformation constants). The precision of the item parameter estimates increased with increasing sample size per test cycle, but no substantial difference was found with respect to the common item difficulty distribution and the scale transformation method. With regard to the feasibility of the equatings, no differences were found for the different scale transformation methods. However, when using the moment methods (mean/mean, mean/sigma), quite extreme levels of error for the transformation constants A and B occurred. Among the characteristic curve method the performance of the Stocking-Lord method was slightly better than for the Haebara method. Thus, while no clear recommendation can be made with regard to the common item difficulty distribution, the characteristic curve methods turned out to be the most favorable scale transformation methods within the CCS.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA