RESUMO
Restricted latent class models (RLCMs) provide an important framework for diagnosing and classifying respondents on a collection of multivariate binary responses. Recent research made significant advances in theory for establishing identifiability conditions for RLCMs with binary and polytomous response data. Multiclass data, which are unordered nominal response data, are also widely collected in the social sciences and psychometrics via forced-choice inventories and multiple choice tests. We establish new identifiability conditions for parameters of RLCMs for multiclass data and discuss the implications for substantive applications. The new identifiability conditions are applicable to a wealth of RLCMs for polytomous and nominal response data. We propose a Bayesian framework for inferring model parameters, assess parameter recovery in a Monte Carlo simulation study, and present an application of the model to a real dataset.
Assuntos
Teorema de Bayes , Análise de Classes Latentes , Método de Monte Carlo , Psicometria , Psicometria/métodos , Humanos , Modelos Estatísticos , Simulação por ComputadorRESUMO
Cognitive diagnostic models provide a framework for classifying individuals into latent proficiency classes, also known as attribute profiles. Recent research has examined the implementation of a Pólya-gamma data augmentation strategy binary response model using logistic item response functions within a Bayesian Gibbs sampling procedure. In this paper, we propose a sequential exploratory diagnostic model for ordinal response data using a logit-link parameterization at the category level and extend the Pólya-gamma data augmentation strategy to ordinal response processes. A Gibbs sampling procedure is presented for efficient Markov chain Monte Carlo (MCMC) estimation methods. We provide results from a Monte Carlo study for model performance and present an application of the model.
Assuntos
Algoritmos , Humanos , Teorema de Bayes , Método de Monte Carlo , Cadeias de MarkovRESUMO
Hidden Markov models (HMMs) have been applied in various domains, which makes the identifiability issue of HMMs popular among researchers. Classical identifiability conditions shown in previous studies are too strong for practical analysis. In this paper, we propose generic identifiability conditions for discrete time HMMs with finite state space. Also, recent studies about cognitive diagnosis models (CDMs) applied first-order HMMs to track changes in attributes related to learning. However, the application of CDMs requires a known [Formula: see text] matrix to infer the underlying structure between latent attributes and items, and the identifiability constraints of the model parameters should also be specified. We propose generic identifiability constraints for our restricted HMM and then estimate the model parameters, including the [Formula: see text] matrix, through a Bayesian framework. We present Monte Carlo simulation results to support our conclusion and apply the developed model to a real dataset.
Assuntos
Cognição , Teorema de Bayes , Psicometria , Simulação por Computador , Método de Monte Carlo , Cadeias de MarkovRESUMO
The specification of the [Formula: see text] matrix in cognitive diagnosis models is important for correct classification of attribute profiles. Researchers have proposed many methods for estimation and validation of the data-driven [Formula: see text] matrices. However, inference of the number of attributes in the general restricted latent class model remains an open question. We propose a Bayesian framework for general restricted latent class models and use the spike-and-slab prior to avoid the computation issues caused by the varying dimensions of model parameters associated with the number of attributes, K. We develop an efficient Metropolis-within-Gibbs algorithm to estimate K and the corresponding [Formula: see text] matrix simultaneously. The proposed algorithm uses the stick-breaking construction to mimic an Indian buffet process and employs a novel Metropolis-Hastings transition step to encourage exploring the sample space associated with different values of K. We evaluate the performance of the proposed method through a simulation study under different model specifications and apply the method to a real data set related to a fluid intelligence matrix reasoning test.
Assuntos
Algoritmos , Análise de Classes Latentes , Teorema de Bayes , Psicometria , Simulação por ComputadorRESUMO
Diagnostic models provide a statistical framework for designing formative assessments by classifying student knowledge profiles according to a collection of fine-grained attributes. The context and ecosystem in which students learn may play an important role in skill mastery, and it is therefore important to develop methods for incorporating student covariates into diagnostic models. Including covariates may provide researchers and practitioners with the ability to evaluate novel interventions or understand the role of background knowledge in attribute mastery. Existing research is designed to include covariates in confirmatory diagnostic models, which are also known as restricted latent class models. We propose new methods for including covariates in exploratory RLCMs that jointly infer the latent structure and evaluate the role of covariates on performance and skill mastery. We present a novel Bayesian formulation and report a Markov chain Monte Carlo algorithm using a Metropolis-within-Gibbs algorithm for approximating the model parameter posterior distribution. We report Monte Carlo simulation evidence regarding the accuracy of our new methods and present results from an application that examines the role of student background knowledge on the mastery of a probability data set.
Assuntos
Algoritmos , Ecossistema , Humanos , Teorema de Bayes , Probabilidade , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Modelos EstatísticosRESUMO
Restricted latent class models (RLCMs) are an important class of methods that provide researchers and practitioners in the educational, psychological, and behavioral sciences with fine-grained diagnostic information to guide interventions. Recent research established sufficient conditions for identifying RLCM parameters. A current challenge that limits widespread application of RLCMs is that existing identifiability conditions may be too restrictive for some practical settings. In this paper we establish a weaker condition for identifying RLCM parameters for multivariate binary data. Although the new results weaken identifiability conditions for general RLCMs, the new results do not relax existing necessary and sufficient conditions for the simpler DINA/DINO models. Theoretically, we introduce a new form of latent structure completeness, referred to as dyad-completeness, and prove identification by applying Kruskal's Theorem for the uniqueness of three-way arrays. The new condition is more likely satisfied in applied research, and the results provide researchers and test-developers with guidance for designing diagnostic instruments.
Assuntos
Algoritmos , Modelos Estatísticos , Análise de Classes Latentes , PsicometriaRESUMO
Researchers continue to develop and advance models for diagnostic research in the social and behavioral sciences. These diagnostic models (DMs) provide researchers with a framework for providing a fine-grained classification of respondents into substantively meaningful latent classes as defined by a multivariate collection of binary attributes. A central concern for DMs is advancing exploratory methods for uncovering the latent structure, which corresponds with the relationship between unobserved binary attributes and observed polytomous items with two or more response options. Multivariate behavioral polytomous data are often collected within a higher-order design where general factors underlying first-order latent variables. This study advances existing exploratory DMs for polytomous data by proposing a new method for inferring the latent structure underlying polytomous response data using a higher-order model to describe dependence among the discrete latent attributes. We report a novel Bayesian formulation that uses variable selection techniques for inferring the latent structure along with a higher-order factor model for attributes. We report evidence of accurate parameter recovery in a Monte Carlo simulation study and present results from an application to the 2012 Programme for International Student Assessment (PISA) problem-solving vignettes to demonstrate the method.
Assuntos
Estudantes , Humanos , Teorema de Bayes , Simulação por Computador , Método de Monte CarloRESUMO
Restricted latent class models (RLCMs) provide an important framework for supporting diagnostic research in education and psychology. Recent research proposed fully exploratory methods for inferring the latent structure. However, prior research is limited by the use of restrictive monotonicity condition or prior formulations that are unable to incorporate prior information about the latent structure to validate expert knowledge. We develop new methods that relax existing monotonicity restrictions and provide greater insight about the latent structure. Furthermore, existing Bayesian methods only use a probit link function and we provide a new formulation for using the exploratory RLCM with a logit link function that has an additional advantage of being computationally more efficient for larger sample sizes. We present four new Bayesian formulations that employ different link functions (i.e., the logit using the Pòlya-gamma data augmentation versus the probit) and priors for inducing sparsity in the latent structure. We report Monte Carlo simulation studies to demonstrate accurate parameter recovery. Furthermore, we report results from an application to the Last Series of the Standard Progressive Matrices to illustrate our new methods.
Assuntos
Modelos Estatísticos , Poli A , Teorema de Bayes , Análise de Classes Latentes , PsicometriaRESUMO
Diagnostic classification models (DCMs) are widely used for providing fine-grained classification of a multidimensional collection of discrete attributes. The application of DCMs requires the specification of the latent structure in what is known as the [Formula: see text] matrix. Expert-specified [Formula: see text] matrices might be biased and result in incorrect diagnostic classifications, so a critical issue is developing methods to estimate [Formula: see text] in order to infer the relationship between latent attributes and items. Existing exploratory methods for estimating [Formula: see text] must pre-specify the number of attributes, K. We present a Bayesian framework to jointly infer the number of attributes K and the elements of [Formula: see text]. We propose the crimp sampling algorithm to transit between different dimensions of K and estimate the underlying [Formula: see text] and model parameters while enforcing model identifiability constraints. We also adapt the Indian buffet process and reversible-jump Markov chain Monte Carlo methods to estimate [Formula: see text]. We report evidence that the crimp sampler performs the best among the three methods. We apply the developed methodology to two data sets and discuss the implications of the findings for future research.
Assuntos
Algoritmos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo , PsicometriaRESUMO
Recently, there has been a renewed interest in the four-parameter item response theory model as a way to capture guessing and slipping behaviors in responses. Research has shown, however, that the nested three-parameter model suffers from issues of unidentifiability (San Martín et al. in Psychometrika 80:450-467, 2015), which places concern on the identifiability of the four-parameter model. Borrowing from recent advances in the identification of cognitive diagnostic models, in particular, the DINA model (Gu and Xu in Stat Sin https://doi.org/10.5705/ss.202018.0420 , 2019), a new model is proposed with restrictions inspired by this new literature to help with the identification issue. Specifically, we show conditions under which the four-parameter model is strictly and generically identified. These conditions inform the presentation of a new exploratory model, which we call the dyad four-parameter normal ogive (Dyad-4PNO) model. This model is developed by placing a hierarchical structure on the DINA model and imposing equality constraints on a priori unknown dyads of items. We present a Bayesian formulation of this model, and show that model parameters can be accurately recovered. Finally, we apply the model to a real dataset.
Assuntos
Modelos Estatísticos , Psicometria , Teorema de BayesRESUMO
Advances in educational technology provide teachers and schools with a wealth of information about student performance. A critical direction for educational research is to harvest the available longitudinal data to provide teachers with real-time diagnoses about students' skill mastery. Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric framework for designing instructionally relevant assessments and diagnoses about students' skill profiles. In this article, the authors contribute to the literature on the development of longitudinal CDMs, by proposing a multivariate latent growth curve model to describe student learning trajectories over time. The model offers several advantages. First, the learning trajectory space is high-dimensional and previously developed models may not be applicable to educational studies that have a modest sample size. In contrast, the method offers a lower dimensional approximation and is more applicable for typical educational studies. Second, practitioners and researchers are interested in identifying factors that cause or relate to student skill acquisition. The framework can easily incorporate covariates to assess theoretical questions about factors that promote learning. The authors demonstrate the utility of their approach with an application to a pre- or post-test educational intervention study and show how the longitudinal CDM framework can provide fine-grained assessment of experimental effects.
RESUMO
Diagnostic models (DMs) provide researchers and practitioners with tools to classify respondents into substantively relevant classes. DMs are widely applied to binary response data; however, binary response models are not applicable to the wealth of ordinal data collected by educational, psychological, and behavioral researchers. Prior research developed confirmatory ordinal DMs that require expert knowledge to specify the underlying structure. This paper introduces an exploratory DM for ordinal data. In particular, we present an exploratory ordinal DM, which uses a cumulative probit link along with Bayesian variable selection techniques to uncover the latent structure. Furthermore, we discuss new identifiability conditions for structured multinomial mixture models with binary attributes. We provide evidence of accurate parameter recovery in a Monte Carlo simulation study across moderate to large sample sizes. We apply the model to twelve items from the public-use, Early Childhood Longitudinal Study, Kindergarten Class of 1998-1999 approaches to learning and self-description questionnaire and report evidence to support a three-attribute solution with eight classes to describe the latent structure underlying the teacher and parent ratings. In short, the developed methodology contributes to the development of ordinal DMs and broadens their applicability to address theoretical and substantive issues more generally across the social sciences.
Assuntos
Algoritmos , Teorema de Bayes , Interpretação Estatística de Dados , Modelos Estatísticos , Método de Monte Carlo , Criança , Simulação por Computador , Humanos , Estudos LongitudinaisRESUMO
The existence of differences in prediction systems involving test scores across demographic groups continues to be a thorny and unresolved scientific, professional, and societal concern. Our case study uses a two-stage least squares (2SLS) estimator to jointly assess measurement invariance and prediction invariance in high-stakes testing. So, we examined differences across groups based on latent as opposed to observed scores with data for 176 colleges and universities from The College Board. Results showed that evidence regarding measurement invariance was rejected for the SAT mathematics (SAT-M) subtest at the 0.01 level for 74.5% and 29.9% of cohorts for Black versus White and Hispanic versus White comparisons, respectively. Also, on average, Black students with the same standing on a common factor had observed SAT-M scores that were nearly a third of a standard deviation lower than for comparable Whites. We also found evidence that group differences in SAT-M measurement intercepts may partly explain the well-known finding of observed differences in prediction intercepts. Additionally, results provided evidence that nearly a quarter of the statistically significant observed intercept differences were not statistically significant at the 0.05 level once predictor measurement error was accounted for using the 2SLS procedure. Our joint measurement and prediction invariance approach based on latent scores opens the door to a new high-stakes testing research agenda whose goal is to not simply assess whether observed group-based differences exist and the size and direction of such differences. Rather, the goal of this research agenda is to assess the causal chain starting with underlying theoretical mechanisms (e.g., contextual factors, differences in latent predictor scores) that affect the size and direction of any observed differences.
Assuntos
Avaliação Educacional/métodos , Análise dos Mínimos Quadrados , Etnicidade , Análise Fatorial , Humanos , Armazenamento e Recuperação da Informação , Conceitos Matemáticos , Psicometria/métodos , Grupos Raciais , UniversidadesRESUMO
Cognitive diagnosis models (CDMs) are an important psychometric framework for classifying students in terms of attribute and/or skill mastery. The [Formula: see text] matrix, which specifies the required attributes for each item, is central to implementing CDMs. The general unavailability of [Formula: see text] for most content areas and datasets poses a barrier to widespread applications of CDMs, and recent research accordingly developed fully exploratory methods to estimate Q. However, current methods do not always offer clear interpretations of the uncovered skills and existing exploratory methods do not use expert knowledge to estimate Q. We consider Bayesian estimation of [Formula: see text] using a prior based upon expert knowledge using a fully Bayesian formulation for a general diagnostic model. The developed method can be used to validate which of the underlying attributes are predicted by experts and to identify residual attributes that remain unexplained by expert knowledge. We report Monte Carlo evidence about the accuracy of selecting active expert-predictors and present an application using Tatsuoka's fraction-subtraction dataset.
Assuntos
Cognição , Conhecimento , Modelos Estatísticos , Humanos , Método de Monte Carlo , Probabilidade , PsicometriaRESUMO
The increasing presence of electronic and online learning resources presents challenges and opportunities for psychometric techniques that can assist in the measurement of abilities and even hasten their mastery. Cognitive diagnosis models (CDMs) are ideal for tracking many fine-grained skills that comprise a domain, and can assist in carefully navigating through the training and assessment of these skills in e-learning applications. A class of CDMs for modeling changes in attributes is proposed, which is referred to as learning trajectories. The authors focus on the development of Bayesian procedures for estimating parameters of a first-order hidden Markov model. An application of the developed model to a spatial rotation experimental intervention is presented.
RESUMO
A Bayesian formulation for a popular conjunctive cognitive diagnosis model, the reduced reparameterized unified model (rRUM), is developed. The new Bayesian formulation of the rRUM employs a latent response data augmentation strategy that yields tractable full conditional distributions. A Gibbs sampling algorithm is described to approximate the posterior distribution of the rRUM parameters. A Monte Carlo study supports accurate parameter recovery and provides evidence that the Gibbs sampler tended to converge in fewer iterations and had a larger effective sample size than a commonly employed Metropolis-Hastings algorithm. The developed method is disseminated for applied researchers as an R package titled "rRUM."
RESUMO
Cognitive diagnosis models are partially ordered latent class models and are used to classify students into skill mastery profiles. The deterministic inputs, noisy "and" gate model (DINA) is a popular psychometric model for cognitive diagnosis. Application of the DINA model requires content expert knowledge of a Q matrix, which maps the attributes or skills needed to master a collection of items. Misspecification of Q has been shown to yield biased diagnostic classifications. We propose a Bayesian framework for estimating the DINA Q matrix. The developed algorithm builds upon prior research (Chen, Liu, Xu, & Ying, in J Am Stat Assoc 110(510):850-866, 2015) and ensures the estimated Q matrix is identified. Monte Carlo evidence is presented to support the accuracy of parameter recovery. The developed methodology is applied to Tatsuoka's fraction-subtraction dataset.
Assuntos
Modelos Estatísticos , Algoritmos , Teorema de Bayes , Simulação por Computador , Método de Monte Carlo , PsicometriaRESUMO
There has been renewed interest in Barton and Lord's (An upper asymptote for the three-parameter logistic item response model (Tech. Rep. No. 80-20). Educational Testing Service, 1981) four-parameter item response model. This paper presents a Bayesian formulation that extends Béguin and Glas (MCMC estimation and some model fit analysis of multidimensional IRT models. Psychometrika, 66 (4):541-561, 2001) and proposes a model for the four-parameter normal ogive (4PNO) model. Monte Carlo evidence is presented concerning the accuracy of parameter recovery. The simulation results support the use of less informative uniform priors for the lower and upper asymptotes, which is an advantage to prior research. Monte Carlo results provide some support for using the deviance information criterion and [Formula: see text] index to choose among models with two, three, and four parameters. The 4PNO is applied to 7491 adolescents' responses to a bullying scale collected under the 2005-2006 Health Behavior in School-Aged Children study. The results support the value of the 4PNO to estimate lower and upper asymptotes in large-scale surveys.
Assuntos
Teorema de Bayes , Psicometria/métodos , Adolescente , Algoritmos , Bullying , Criança , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Modelos Logísticos , Método de Monte CarloRESUMO
Standardized tests are frequently used for selection decisions, and the validation of test scores remains an important area of research. This paper builds upon prior literature about the effect of nonlinearity and heteroscedasticity on the accuracy of standard formulas for correcting correlations in restricted samples. Existing formulas for direct range restriction require three assumptions: (1) the criterion variable is missing at random; (2) a linear relationship between independent and dependent variables; and (3) constant error variance or homoscedasticity. The results in this paper demonstrate that the standard approach for correcting restricted correlations is severely biased in cases of extreme monotone quadratic nonlinearity and heteroscedasticity. This paper offers at least three significant contributions to the existing literature. First, a method from the econometrics literature is adapted to provide more accurate estimates of unrestricted correlations. Second, derivations establish bounds on the degree of bias attributed to quadratic functions under the assumption of a monotonic relationship between test scores and criterion measurements. New results are presented on the bias associated with using the standard range restriction correction formula, and the results show that the standard correction formula yields estimates of unrestricted correlations that deviate by as much as 0.2 for high to moderate selectivity. Third, Monte Carlo simulation results demonstrate that the new procedure for correcting restricted correlations provides more accurate estimates in the presence of quadratic and heteroscedastic test score and criterion relationships.
Assuntos
Avaliação Educacional , Estatística como Assunto , Viés , Humanos , Método de Monte Carlo , Seleção de Pessoal , PsicometriaRESUMO
This paper assesses the psychometric value of allowing test-takers choice in standardized testing. New theoretical results examine the conditions where allowing choice improves score precision. A hierarchical framework is presented for jointly modeling the accuracy of cognitive responses and item choices. The statistical methodology is disseminated in the 'cIRT' R package. An 'answer two, choose one' (A2C1) test administration design is introduced to avoid challenges associated with nonignorable missing data. Experimental results suggest that the A2C1 design and payout structure encouraged subjects to choose items consistent with their cognitive trait levels. Substantively, the experimental data suggest that item choices yielded comparable information and discrimination ability as cognitive items. Given there are no clear guidelines for writing more or less discriminating items, one practical implication is that choice can serve as a mechanism to improve score precision.