Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Educ Psychol Meas ; 84(3): 481-509, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38756464

RESUMO

A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's information criterion (DIC), sample size adjusted BIC (SABIC), relative entropy, the integrated classification likelihood criterion (ICL-BIC), the adjusted Lo-Mendell-Rubin (LMR), and Vuong-Lo-Mendell-Rubin (VLMR). The accuracy of the fit indices was assessed for correct detection of the number of latent classes for different simulation conditions including sample size (2,500 and 5,000), test length (15, 30, and 45), mixture proportions (equal and unequal), number of latent classes (2, 3, and 4), and latent class separation (no-separation and small separation). Simulation study results indicated that as the number of examinees or number of items increased, correct identification rates also increased for most of the indices. Correct identification rates by the different fit indices, however, decreased as the number of estimated latent classes or parameters (i.e., model complexity) increased. Results were good for BIC, CAIC, DIC, SABIC, ICL-BIC, LMR, and VLMR, and the relative entropy index tended to select correct models most of the time. Consistent with previous studies, AIC and AICc showed poor performance. Most of these indices had limited utility for three-class and four-class mixture 3PL model conditions.

2.
Br J Math Stat Psychol ; 77(1): 130-150, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37702452

RESUMO

Textual data are increasingly common in test data as many assessments include constructed response (CR) items as indicators of participants' understanding. The development of techniques based on natural language processing has made it possible for researchers to rapidly analyse large sets of textual data. One family of statistical techniques for this purpose are probabilistic topic models. Topic modelling is a technique for detecting the latent topic structure in a collection of documents and has been widely used to analyse texts in a variety of areas. The detected topics can reveal primary themes in the documents, and the relative use of topics can be useful in investigating the variability of the documents. Supervised latent Dirichlet allocation (SLDA) is a popular topic model in that family that jointly models textual data and paired responses such as could occur with participants' textual answers to CR items and their rubric-based scores. SLDA has an assumption of a homogeneous relationship between textual data and paired responses across all documents. This approach, while useful for some purposes, may not be satisfied for situations in which a population has subgroups that have different relationships. In this study, we introduce a new supervised topic model that incorporates finite-mixture modelling into the SLDA. This new model can detect latent groups of participants that have different relationships between their textual responses and associated scores. The model is illustrated with an example from an analysis of a set of textual responses and paired scores from a middle grades assessment of science inquiry knowledge. A simulation study is presented to investigate the performance of the proposed model under practical testing conditions.


Assuntos
Modelos Estatísticos , Processamento de Linguagem Natural , Humanos , Simulação por Computador
3.
Appl Psychol Meas ; 47(5-6): 402-419, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37810543

RESUMO

Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees' posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.

4.
Educ Psychol Meas ; 83(3): 520-555, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37187690

RESUMO

The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.

6.
J Sci Educ Technol ; 30(3): 331-346, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33424211

RESUMO

The use of serious educational games has the potential to increase student learning outcomes in science education by providing students with opportunities to explore phenomena in ways that vary from traditional instruction; yet, empirical research to support this assertion is limited. This study aimed to explore deeply what learning gains were associated with the use of three serious educational games (SEGs) created for use in secondary biology classrooms that partner teachers implemented during a 2-week curriculum unit. This longitudinal, mixed method study includes a control year, in which we examined how six highly qualified teachers taught students (n = 407) a 2-week curriculum unit addressing cellular biology without the SEGs, followed by 2 years in which the teachers integrated the SEGs into the curriculum unit with students (n =871). Data were collected from multiple sources, including a validated content pre- and post-test measure, embedded gameplay data, participant observation, teacher interviews, and focus groups. Quantitative findings showed significant learning gains associated with students who experienced the game condition during year 2, when compared with the control condition. During the replication year (year 3), learning gains increased again, compared with year two. Although the SEGs did not change between years 2 and 3, teachers were provided real-time access to students' performance during gameplay. Thematic analysis of observation notes, teacher interviews, and student performance in-game identified four affordances teachers identified related to the use of serious educational games in their classrooms and the extended partnership model employed. Implications for researchers and game designers are discussed.

7.
Educ Psychol Meas ; 80(5): 975-994, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32855567

RESUMO

A nonconverged Markov chain can potentially lead to invalid inferences about model parameters. The purpose of this study was to assess the effect of a nonconverged Markov chain on the estimation of parameters for mixture item response theory models using a Markov chain Monte Carlo algorithm. A simulation study was conducted to investigate the accuracy of model parameters estimated with different degree of convergence. Results indicated the accuracy of the estimated model parameters for the mixture item response theory models decreased as the number of iterations of the Markov chain decreased. In particular, increasing the number of burn-in iterations resulted in more accurate estimation of mixture IRT model parameters. In addition, the different methods for monitoring convergence of a Markov chain resulted in different degrees of convergence despite almost identical accuracy of estimation.

8.
Front Psychol ; 11: 197, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32116973

RESUMO

The standard item response theory (IRT) model assumption of a single homogenous population may be violated in real data. Mixture extensions of IRT models have been proposed to account for latent heterogeneous populations, but these models are not designed to handle multilevel data structures. Ignoring the multilevel structure is problematic as it results in lower-level units aggregated with higher-level units and yields less accurate results, because of dependencies in the data. Multilevel data structures cause such dependencies between levels but can be modeled in a straightforward way in multilevel mixture IRT models. An important step in the use of multilevel mixture IRT models is the fit of the model to the data. This fit is often determined based on relative fit indices. Previous research on mixture IRT models has shown that performances of these indices and classification accuracy of these models can be affected by several factors including percentage of class-variant items, number of items, magnitude and size of clusters, and mixing proportions of latent classes. As yet, no studies appear to have been reported examining these issues for multilevel extensions of mixture IRT models. The current study aims to investigate the effects of several features of the data on the accuracy of model selection and parameter recovery. Results are reported on a simulation study designed to examine the following features of the data: percentages of class-variant items (30, 60, and 90%), numbers of latent classes in the data (with from 1 to 3 latent classes at level 1 and 1 and 2 latent classes at level 2), numbers of items (10, 30, and 50), numbers of clusters (50 and 100), cluster size (10 and 50), and mixing proportions [equal (0.5 and 0.5) vs. non-equal (0.25 and 0.75)]. Simulation results indicated that multilevel mixture IRT models resulted in less accurate estimates when the number of clusters and the cluster size were small. In addition, mean Root mean square error (RMSE) values increased as the percentage of class-variant items increased and parameters were recovered more accurately under the 30% class-variant item conditions. Mixing proportion type (i.e., equal vs. unequal latent class sizes) and numbers of items (10, 30, and 50), however, did not show any clear pattern. Sample size dependent fit indices BIC, CAIC, and SABIC performed poorly for the smaller level-1 sample size. For the remaining conditions, the SABIC index performed better than other fit indices.

9.
Appl Psychol Meas ; 44(2): 137-149, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32076357

RESUMO

This study describes a structural equation modeling (SEM) approach to reliability for tests with items having different numbers of ordered categories. A simulation study is provided to compare the performance of this reliability coefficient, coefficient alpha and population reliability for tests having items with different numbers of ordered categories, a one-factor and a bifactor structures, and different skewness distributions of test scores. Results indicated that the proposed reliability coefficient was close to the population reliability in most conditions. An empirical example was used to illustrate the performance of the different coefficients for a test of items with two or three ordered categories.

10.
Front Psychol ; 11: 621251, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33569029

RESUMO

Results of a comprehensive simulation study are reported investigating the effects of sample size, test length, number of attributes and base rate of mastery on item parameter recovery and classification accuracy of four DCMs (i.e., C-RUM, DINA, DINO, and LCDMREDUCED). Effects were evaluated using bias and RMSE computed between true (i.e., generating) parameters and estimated parameters. Effects of simulated factors on attribute assignment were also evaluated using the percentage of classification accuracy. More precise estimates of item parameters were obtained with larger sample size and longer test length. Recovery of item parameters decreased as the number of attributes increased from three to five but base rate of mastery had a varying effect on the item recovery. Item parameter and classification accuracy were higher for DINA and DINO models.

11.
Front Psychol ; 11: 579199, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33633622

RESUMO

Selected response items and constructed response (CR) items are often found in the same test. Conventional psychometric models for these two types of items typically focus on using the scores for correctness of the responses. Recent research suggests, however, that more information may be available from the CR items than just scores for correctness. In this study, we describe an approach in which a statistical topic model along with a diagnostic classification model (DCM) was applied to a mixed item format formative test of English and Language Arts. The DCM was used to estimate students' mastery status of reading skills. These mastery statuses were then included in a topic model as covariates to predict students' use of each of the latent topics in their written answers to a CR item. This approach enabled investigation of the effects of mastery status of reading skills on writing patterns. Results indicated that one of the skills, Integration of Knowledge and Ideas, helped detect and explain students' writing patterns with respect to students' use of individual topics.

12.
J Appl Meas ; 20(4): 384-398, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31730545

RESUMO

A review of various priors used in Bayesian estimation under the Rasch model is presented together with clear mathematical definitions of the hierarchical prior distributions. A Bayesian estimation method, Gibbs sampling, was compared with conditional, marginal, and joint maximum likelihood estimation methods using the Knox Cube Test data under the Rasch model. The shrinkage effect of the priors on item and ability parameter estimates was also investigated using the Knox Cube Test data. In addition, item response data for a mathematics test with 14 items by 765 examinees were analyzed with the joint maximum likelihood estimation method and Gibbs sampling under the Rasch model. Both methods yielded nearly identical item parameter estimates. The shrinkage effect was observed in the ability estimates from Gibbs sampling. The computer program OpenBUGS that implemented the rejection sampling method of Gibbs sampling was the main program employed in the study.


Assuntos
Psicometria , Software , Teorema de Bayes
14.
Appl Psychol Meas ; 43(4): 272-289, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31156280

RESUMO

Mixture item response theory (MixIRT) models can sometimes be used to model the heterogeneity among the individuals from different subpopulations, but these models do not account for the multilevel structure that is common in educational and psychological data. Multilevel extensions of the MixIRT models have been proposed to address this shortcoming. Successful applications of multilevel MixIRT models depend in part on detection of the best fitting model. In this study, performance of information indices, Akaike information criterion (AIC), Bayesian information criterion (BIC), consistent Akaike information criterion (CAIC), and sample-size adjusted Bayesian information criterion (SABIC), were compared for use in model selection with a two-level mixture Rasch model in the context of a real data example and a simulation study. Level 1 consisted of students and Level 2 consisted of schools. The performances of the model selection criteria under different sample sizes were investigated in a simulation study. Total sample size (number of students) and Level 2 sample size (number of schools) were studied for calculation of information criterion indices to examine the performance of these fit indices. Simulation study results indicated that CAIC and BIC performed better than the other indices at detection of the true (i.e., generating) model. Furthermore, information indices based on total sample size yielded more accurate detections than indices at Level 2.

15.
Appl Psychol Meas ; 43(2): 95-112, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30792558

RESUMO

A brief review of various information criteria is presented for the detection of differential item functioning (DIF) under item response theory (IRT). An illustration of using information criteria for model selection as well as results with simulated data are presented and contrasted with the IRT likelihood ratio (LR) DIF detection method. Use of information criteria for general IRT model selection is discussed.

17.
Contemp Clin Trials ; 50: 253-64, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27521809

RESUMO

INTRODUCTION: For opioid-dependent patients in the US and elsewhere, detoxification and counseling-only aftercare are treatment mainstays. Long-term abstinence is rarely achieved; many patients relapse and overdose after detoxification. Methadone, buprenorphine-naloxone (BUP-NX) and extended-release naltrexone (XR-NTX) can prevent opioid relapse but are underutilized. This study is intended to develop an evidence-base to help patients and providers make informed choices and to foster wider adoption of relapse-prevention pharmacotherapies. METHODS: The National Institute on Drug Abuse's Clinical Trials Network (CTN) study CTN-0051, X:BOT, is a comparative effectiveness study of treatment for 24weeks with XR-NTX, an opioid antagonist, versus BUP-NX, a high affinity partial opioid agonist, for opioid dependent patients initiating treatment at 8 short-term residential (detoxification) units and continuing care as outpatients. Up to 600 participants are randomized (1:1) to XR-NTX or BUP-NX. RESULTS: The primary outcome is time to opioid relapse (i.e., loss of persistent abstinence) across the 24-week treatment phase. Differences between arms in the distribution of time-to-relapse will be compared (construction of the asymptotic 95% CI for the hazard ratio of the difference between arms). Secondary outcomes include proportions retained in treatment, rates of opioid abstinence, adverse events, cigarette, alcohol, and other drug use, and HIV risk behaviors; opioid cravings, quality of life, cognitive function, genetic moderators, and cost effectiveness. CONCLUSIONS: XR-NTX and BUP-NX differ considerably in their characteristics and clinical management; no studies to date have compared XR-NTX with buprenorphine maintenance. Study design choices and compromises inherent to a comparative effectiveness trial of distinct treatment regimens are reviewed. CLINICAL TRIAL REGISTRATION: NCT02032433.


Assuntos
Combinação Buprenorfina e Naloxona/uso terapêutico , Pesquisa Comparativa da Efetividade/métodos , Naltrexona/uso terapêutico , Antagonistas de Entorpecentes/uso terapêutico , Transtornos Relacionados ao Uso de Opioides/tratamento farmacológico , Combinação Buprenorfina e Naloxona/economia , Análise Custo-Benefício , Preparações de Ação Retardada , Feminino , Humanos , Injeções Intramusculares , Masculino , Naltrexona/administração & dosagem , Naltrexona/economia , Antagonistas de Entorpecentes/economia , National Institute on Drug Abuse (U.S.) , Fatores Socioeconômicos , Estados Unidos
18.
Appl Psychol Meas ; 40(2): 98-113, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29881041

RESUMO

Unidimensional, item response theory (IRT) models assume a single homogeneous population. Mixture IRT (MixIRT) models can be useful when subpopulations are suspected. The usual MixIRT model is typically estimated assuming a normally distributed latent ability. Research on normal finite mixture models suggests that latent classes potentially can be extracted, even in the absence of population heterogeneity, if the distribution of the data is non-normal. In this study, the authors examined the sensitivity of MixIRT models to latent non-normality. Single-class IRT data sets were generated using different ability distributions and then analyzed with MixIRT models to determine the impact of these distributions on the extraction of latent classes. Results suggest that estimation of mixed Rasch models resulted in spurious latent class problems in the data when distributions were bimodal and uniform. Mixture two-parameter logistic (2PL) and mixture three-parameter logistic (3PL) IRT models were found to be more robust to latent non-normality.

19.
Educ Psychol Meas ; 76(2): 181-204, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29795862

RESUMO

Latent transition analysis (LTA) was initially developed to provide a means of measuring change in dynamic latent variables. In this article, we illustrate the use of a cognitive diagnostic model, the DINA model, as the measurement model in a LTA, thereby demonstrating a means of analyzing change in cognitive skills over time. An example is presented of an instructional treatment on a sample of seventh-grade students in several classrooms in a Midwestern school district. In the example, it is demonstrated how hypotheses could be framed and then tested regarding the form of the change in different groups within the population. Both manifest and latent groups also are defined and used to test additional hypotheses about change specific to particular subpopulations. Results suggest that the use of a DINA measurement model expands the utility of LTA to practical problems in educational measurement research.

20.
Educ Psychol Meas ; 75(6): 931-953, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29795847

RESUMO

Test tampering, especially on tests for educational accountability, is an unfortunate reality, necessitating that the state (or its testing vendor) perform data forensic analyses, such as erasure analyses, to look for signs of possible malfeasance. Few statistical approaches exist for detecting fraudulent erasures, and those that do largely do not lend themselves to making probabilistic statements about the likelihood of the observations. In this article, a new erasure detection index, EDI, is developed, which uses item response theory to compare the number of observed wrong-to-right erasures to the number expected due to chance, conditional on the examinee's ability-level and number of erased items. A simulation study is presented to evaluate the Type I error rate and power of EDI under various types of fraudulent and benign erasures. Results show that EDI with a correction for continuity yields Type I error rates that are less than or equal to nominal levels for every condition studied, and has high power to detect even small amounts of tampering among the students for whom tampering is most likely.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA