Pesquisa | BVS IEC

1.

Detecting Rater Effects under Rating Designs with Varying Levels of Missingness.

Stafford, Rose E; Wolfe, Edward W; Casablanca, Jodi M; Song, Tian.

J Appl Meas ; 19(3): 243-257, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30169333

RESUMO

Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.

Assuntos

Modelos Estatísticos , Psicometria/métodos , Coleta de Dados , Humanos , Reprodutibilidade dos Testes , Pesquisa

2.

Comparison of Models and Indices for Detecting Rater Centrality.

Wolfe, Edward W; Song, Tian.

J Appl Meas ; 16(3): 228-41, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26753219

RESUMO

To date, much of the research concerning rater effects has focused on rater severity/leniency. Consequently, other potentially important rater effects have largely ignored by those conducting operational scoring projects. This simulation study compares four rater centrality indices (rater fit, residual-expected correlations, rater slope, and rater threshold variance) in terms of their Type I and Type II error rates under varying levels of centrality magnitude, centrality pervasiveness, and rating scale construction when each of four latent trait models is fitted to the simulated data (Rasch rating scale and partial credit models and the generalized rating scale and partial credit models). Results indicate that the residual-expected correlation may be most appropriately sensitive to rater centrality under most conditions.

Assuntos

Modelos Estatísticos , Reprodutibilidade dos Testes , Pesquisa

3.

A Family of Rater Accuracy Models.

Wolfe, Edward W; Jiao, Hong; Song, Tian.

J Appl Meas ; 16(2): 153-60, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26075664

RESUMO

Engelhard (1996) proposed a rater accuracy model (RAM) as a means of evaluating rater accuracy in rating data, but very little research exists to determine the efficacy of that model. The RAM requires a transformation of the raw score data to accuracy measures by comparing rater-assigned scores to true scores. Indices computed based on raw scores also exist for measuring rater effects, but these indices ignore deviations of rater-assigned scores from true scores. This paper demonstrates the efficacy of two versions of the RAM (based on dichotomized and polytomized deviations of rater-assigned scores from true scores) to two versions of raw score rater effect models (i.e., a Rasch partial credit model, PCM, and a Rasch rating scale model, RSM). Simulated data are used to demonstrate the efficacy with which these four models detect and differentiate three rater effects: severity, centrality, and inaccuracy. Results indicate that the RAMs are able to detect, but not differentiate, rater severity and inaccuracy, but not rater centrality. The PCM and RSM, on the other hand, are able to both detect and differentiate all three of these rater effects. However, the RSM and PCM do not take into account true scores and may, therefore, be misleading when pervasive trends exist in the rater-assigned data.

Assuntos

Coleta de Dados/normas , Interpretação Estatística de Dados , Avaliação Educacional , Modelos Teóricos , Avaliação Educacional/estatística & dados numéricos

4.

Performance of the likelihood ratio difference (G2 Diff) test for detecting unidimensionality in applications of the multidimensional Rasch model.

Harrell-Williams, Leigh; Wolfe, Edward W.

J Appl Meas ; 15(3): 267-75, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24992250

RESUMO

Previous research has investigated the influence of sample size, model misspecification, test length, ability distribution offset, and generating model on the likelihood ratio difference test in applications of item response models. This study extended that research to the evaluation of dimensionality using the multidimensional random coefficients multinomial logit model (MRCMLM). Logistic regression analysis of simulated data reveal that sample size and test length have a large effect on the capacity of the LR difference test to correctly identify unidimensionality, with shorter tests and smaller sample sizes leading to smaller Type I error rates. Higher levels of simulated misfit resulted in fewer incorrect decisions than data with no or little misfit. However, Type I error rates indicate that the likelihood ratio difference test is not suitable under any of the simulated conditions for evaluating dimensionality in applications of the MRCMLM.

Assuntos

Funções Verossimilhança , Modelos Logísticos , Modelos Estatísticos , Testes Psicológicos/estatística & dados numéricos , Psicometria/estatística & dados numéricos , Simulação por Computador , Humanos , Computação Matemática , Tamanho da Amostra

5.

Rater effect comparability in local independence and rater bundle models.

Wolfe, Edward W; Song, Tian.

J Appl Meas ; 15(2): 152-9, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24950533

RESUMO

A large body of literature exists describing how rater effects may be detected in rating data. In this study, we compared the flag and agreement rates for several rater effects based on calibration of a real data under two psychometric models-the Rasch rating scale model (RSM) and the Rasch testlet-based rater bundle model (RBM). The results show that the RBM provided more accurate diagnoses of rater severity and leniency than do the RSM which is based on the local independence assumption. However, the statistical indicators associated with rater centrality and inaccuracy remain consistent between these two models.

Assuntos

Interpretação Estatística de Dados , Avaliação Educacional/estatística & dados numéricos , Modelos Estatísticos , Psicometria/estatística & dados numéricos , Viés , Coleta de Dados/estatística & dados numéricos , Humanos , Capacitação em Serviço , Variações Dependentes do Observador , Reprodutibilidade dos Testes

6.

A bootstrap approach to evaluating person and item fit to the Rasch model.

Wolfe, Edward W.

J Appl Meas ; 14(1): 1-9, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23442324

RESUMO

Historically, rule-of-thumb critical values have been employed for interpreting fit statistics that depict anomalous person and item response patterns in applications of the Rasch model. Unfortunately, prior research has shown that these values are not appropriate in many contexts. This article introduces a bootstrap procedure for identifying reasonable critical values for Rasch fit statistics and compares the results of that procedure to applications of rule-of-thumb critical values for three example datasets. The results indicate that rule-of-thumb values may over- or under-identify the number of misfitting items or persons.

Assuntos

Algoritmos , Interpretação Estatística de Dados , Modificador do Efeito Epidemiológico , Análise por Pareamento , Modelos Estatísticos , Psicometria/métodos , Estatística como Assunto , Simulação por Computador , Humanos

7.

Application of the Rasch model to measuring the performance of cognitive radios.

Wolfe, Edward W; Dietrich, Carl B; Vanhoy, Garrett.

J Appl Meas ; 14(4): 332-8, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24064575

RESUMO

Cognitive radios (CRs) are recent technological developments that rely on artificial intelligence to adapt a radio's performance to suit environmental demands, such as sharing radio frequencies with other radios. Measuring the performance of the cognitive engines (CEs) that underlie a CR's performance is a challenge for those developing CR technology. This simulation study illustrates how the Rasch model can be applied to the evaluation of CRs. We simulated the responses of 50 CEs to 35 performance tasks and applied the Random Coefficients Multidimensional Multinomial Logit Model (MRCMLM) to those data. Our results indicate that CEs based on different algorithms may exhibit differential performance across manipulated performance task parameters. We found that a multidimensional mixture model may provide the best fit to the simulated data and that the two algorithms simulated may respond to tasks that emphasize achieving high levels of data throughput coupled with lower emphasis on power conservation differently than they do to other combinations of performance task characteristics.

Assuntos

Inteligência Artificial , Cognição , Interpretação Estatística de Dados , Análise de Falha de Equipamento/métodos , Análise de Falha de Equipamento/estatística & dados numéricos , Modelos Estatísticos , Telecomunicações/instrumentação , Algoritmos

8.

A comparison of structural equation and multidimensional Rasch modeling approaches to Confirmatory Factor Analysis.

Wolfe, Edward W; Singh, Kusum.

J Appl Meas ; 12(3): 212-21, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22357124

RESUMO

This paper compares the results of applications of the Multidimensional Random Coefficients Multinomial Logit Model (MRCMLM) to comparable Structural Equation Model (SEM) applications for the purpose of conducting a Confirmatory Factor Analysis (CFA). We review SEM as it is applied to CFA, identify some parallels between the MRCMLM approach to CFA and that utilized in a standard SEM CFA, and illustrate the comparability of MRCMLM and SEM CFA results for three datasets. Results indicate that the two approaches tend to identify similar dimensional models as exhibiting best fit and provide comparable depictions of latent variable correlations, but the two procedures depict the reliability of measures differently.

Assuntos

Análise Fatorial , Modelos Estatísticos , Psicometria , Humanos , Reprodutibilidade dos Testes , Projetos de Pesquisa

9.

Comparability of item quality indices from sparse data matrices with random and non-random missing data patterns.

Wolfe, Edward W; McGill, Michael T.

J Appl Meas ; 12(4): 358-69, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22357157

RESUMO

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.

Assuntos

Viés , Simulação por Computador , Interpretação Estatística de Dados , Licenciamento em Enfermagem/estatística & dados numéricos , Computação Matemática , Software , Gráficos por Computador , Humanos , Modelos Lineares , Reprodutibilidade dos Testes , Tamanho da Amostra

10.

Equating designs and procedures used in Rasch scaling.

Skaggs, Gary; Wolfe, Edward W.

J Appl Meas ; 11(2): 182-95, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20693702

RESUMO

The development of alternate forms of tests requires a statistical score adjustment called equating that permits the interchanging of scores from different test forms. Equating makes possible several important measurement applications, including removing practice effects in pretest-posttest research designs, improving test security, comparing scores between new and old forms, and supporting item bank development for computerized adaptive testing. This article summarizes equating methods from a Rasch measurement perspective. The four sections of this article present an introduction and definition of equating and related linking methods, data collection designs, equating procedures, and evaluating equated measures. The methods are illustrated with worked examples.

Assuntos

Avaliação Educacional/estatística & dados numéricos , Modelos Estatísticos , Bioestatística , Coleta de Dados/métodos , Humanos

11.

Development of scales relating to professional development of community college administrators.

Wolfe, Edward W; Van Der Linden, Kim E.

J Appl Meas ; 11(2): 142-57, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20693699

RESUMO

This article reports the results of an application of the Multidimensional Random Coefficients Multinomial Logit Model (MRCMLM) to the measurement of professional development activities in which community college administrators participate. The analyses focus on confirmation of the factorial structure of the instrument, evaluation of the quality of the activities calibrations, examination of the internal structure of the instrument, and comparison of groups of administrators. The dimensionality analysis results suggest a five-dimensional model that is consistent with previous literature concerning career paths of community college administrators - education and specialized training, internal professional development and mentoring, external professional development, employer support, and seniority. The indicators of the quality of the activity calibrations suggest that measures of the five dimensions are adequately reliable, that the activities in each dimension are internally consistent, and that the observed responses to each activity are consistent with the expected values of the MRCMLM. The hierarchy of administrator measure means and of activity calibrations is consistent with substantive theory relating to professional development for community college administrators. For example, readily available activities that occur at the institution were most likely to be engaged in by administrators, while participation in selective specialized training institutes were the least likely activities. Finally, group differences with respect to age and title were consistent with substantive expectations - the greater the administrator's age and the higher the rank of the administrator's title, the greater the probability of having engaged in various types of professional development.

Assuntos

Pessoal Administrativo , Instituições Acadêmicas/organização & administração , Pessoal Administrativo/educação , Pessoal Administrativo/estatística & dados numéricos , Humanos , Liderança , Mentores , Modelos Estatísticos , Papel Profissional , Instituições Acadêmicas/estatística & dados numéricos , Sociedades , Estados Unidos

12.

Item and rater analysis of constructed response items via the multi-faceted Rasch model.

Wolfe, Edward W.

J Appl Meas ; 10(3): 335-47, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19671993

RESUMO

This article describes how the multi-faceted Rasch model (MFRM) can be applied to item and rater analysis and the types of information that is made available by a multifaceted analysis of constructed-response items. Particularly, the text describes evidence that is made available by such analyses that is relevant to improving item and rubric development as well as rater training and monitoring. The article provides an introduction to MRFM extensions of the family of Rasch models, a description of item analysis procedures, a description of rater analysis procedures, and concludes with an example analysis conducted using a commercially available program that implements the MFRM, Facets.

Assuntos

Avaliação Educacional/estatística & dados numéricos , Modelos Estatísticos , Psicometria , Viés , Avaliação Educacional/métodos , Humanos

13.

An application of the multidimensional random coefficients multinomial logit model to evaluating cognitive models of reasoning in genetics.

Wolfe, Edward W; Hickey, Daniel T; Kindfield, Ann C H.

J Appl Meas ; 10(2): 196-207, 2009.

Artigo em Inglês | MEDLINE | ID: mdl-19564699

RESUMO

This article summarizes multidimensional Rasch analyses in which several alternative models of genetics reasoning are evaluated based on item response data from secondary students who participated in a genetics reasoning curriculum. The various depictions of genetics reasoning are compared by fitting several models to the item response data and comparing data-to-model fit at the model level between hierarchically nested models. We conclude that two two-dimensional models provide a substantively better depiction of student performance than does a unidimensional model or more complex three- and four-dimensional models.

Assuntos

Cognição , Genética/educação , Modelos Estatísticos , Adolescente , Avaliação Educacional/estatística & dados numéricos , Humanos

14.

Instrument development tools and activities for measure validation using Rasch models: part I - instrument development tools.

Wolfe, Edward W; Smith, Everett V.

J Appl Meas ; 8(1): 97-123, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17215568

RESUMO

Instrument development is an arduous task that, if undertaken with care and consideration, can lay the foundation for the development of validity arguments relating to the inferences and decisions that are based on test measures. This article, Part I of a two-part series, provides an overview of validity concepts and describes how instrument development efforts can be conducted to facilitate the development of validity arguments. Our discussion focuses on documentation of the purpose of measurement, creation of test specifications, item development, expert review, and planning of pilot studies. Through these instrument development activities and tools, essential information is documented that will feed into the analysis, summary, and reporting of data relevant to validity arguments discussed in Part II of this series.

Assuntos

Avaliação Educacional , Modelos Psicológicos , Psicologia/métodos , Cultura , Humanos

15.

Instrument development tools and activities for measure validation using Rasch models: part II--validation activities.

Wolfe, Edward W; Smith, Everett V.

J Appl Meas ; 8(2): 204-34, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17440262

RESUMO

Accumulation of validity evidence is an important part of the instrument development process. In Part I of a two-part series, we provided an overview of validity concepts and described how instrument development efforts can be conducted to facilitate the development of validity arguments. In this, Part II of the series, we identify how analyses, especially those conducted within a Rasch measurement framework, can be used to provide evidence to support validity arguments that are founded during the instrument development process.

Assuntos

Interpretação Estatística de Dados , Modelos Estatísticos , Humanos

16.

Validation of a questionnaire used to assess safety and professionalism among arborists.

Viger, Steven G; Wolfe, Edward W; Dozier, Hallie; Machtmes, Krisanna.

J Appl Meas ; 7(3): 292-306, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16807495

RESUMO

This article summarizes a validation study of an instrument designed to measure safety and professionalism practices of arborists. A sample of 386 arborists from the State of Louisiana responded to the 58-item questionnaire. Analyses focused on several aspects of Messick's validation framework. Structural validity evidence was provided by analyses that indicate that the measures are unidimensional. Content validity evidence was supported by generally high positive values of the biserial correlations and optimal values of standardized mean-square item fit indices. Substantive validity evidence was provided by analyses that support the use of the two-point rating scale and a rank ordering of item means that is consistent with substantive theory. Person fit indices indicated little misfit among measures. Support for the generalizability aspect of validity was provided by an acceptable level of internal consistency and fairly tight error bands around estimated arborist measures. Additionally, few items exhibited DIF. Finally, with respect to the external aspect of validity, group differences between arborist measures were consistent with substantive theory.

Assuntos

Agricultura Florestal , Competência Profissional , Segurança , Inquéritos e Questionários/normas , Adulto , Feminino , Humanos , Louisiana , Masculino , Pessoa de Meia-Idade , Competência Profissional/estatística & dados numéricos , Segurança/estatística & dados numéricos , Estados Unidos , Recursos Humanos

17.

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model.

Wang, Jue; Engelhard, George; Wolfe, Edward W.

Educ Psychol Meas ; 76(6): 1005-1025, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-29795898

RESUMO

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy ratings (0 = inaccurate, 1 = accurate) are unfolded into three latent categories: inaccurate below expert ratings, accurate ratings, and inaccurate above expert ratings. The hyperbolic cosine model (HCM) is used to examine dichotomous accuracy ratings from a statewide writing assessment. This study suggests that HCM is a promising approach for examining rater accuracy, and that the HCM can provide a useful interpretive framework for evaluating the quality of ratings obtained within the context of rater-mediated assessments.

18.

Differentiation of Illusory and True Halo in Writing Scores.

Lai, Emily R; Wolfe, Edward W; Vickers, Daisy.

Educ Psychol Meas ; 75(1): 102-125, 2015 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-29795814

RESUMO

This report summarizes an empirical study that addresses two related topics within the context of writing assessment-illusory halo and how much unique information is provided by multiple analytic scores. Specifically, we address the issue of whether unique information is provided by analytic scores assigned to student writing, beyond what is depicted by holistic scores, and to what degree multiple analytic scores assigned by a single rater display evidence of illusory halo. To that end, we analyze student responses to an expository writing prompt that were scored by six groups of raters-four groups assigned single analytic scores, one group assigned multiple analytic scores, and one group assigned holistic scores-using structural equation modeling. Our results suggest that there is evidence of illusory halo when raters assign multiple analytic scores to a single student response and that, at best, only two factors seem to be distinguishable in analytic writing scores assigned to expository essays.

19.

Using logistic regression to detect item-level non-response bias in surveys.

Wolfe, Edward W.

J Appl Meas ; 4(3): 234-48, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-12904674

RESUMO

This article describes a procedure for evaluating item-level non-response bias in questionnaire items. Specifically, logistic regression is used to determine whether non-responses are random or systematic in nature for one question from the National Educational Longitudinal Study of 1994 concerning drug use behaviors. It turns out that, indeed, non-responses are systematic with males and lower achieving students being more likely to contribute to non-response along with two-way interactions between ethnicity and SES and ethnicity and geographic region. In addition, the magnitude of the potential bias is estimated, which demonstrates that the parameter estimates obtained by assuming that the data are missing at random may be extremely biased, given this frame of reference. Finally, several steps are suggested for evaluating the threat of non-response bias in survey research.

Assuntos

Viés , Modelos Logísticos , Psicometria/estatística & dados numéricos , Inquéritos e Questionários , Adolescente , Algoritmos , Estudos de Coortes , Feminino , Financiamento Pessoal , Humanos , Estudos Longitudinais , Masculino , Privacidade , Autorrevelação , Transtornos Relacionados ao Uso de Substâncias/economia , Transtornos Relacionados ao Uso de Substâncias/epidemiologia , Transtornos Relacionados ao Uso de Substâncias/psicologia

20.

Detecting and measuring rater effects using many-facet Rasch measurement: part I.

Myford, Carol M; Wolfe, Edward W.

J Appl Meas ; 4(4): 386-422, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-14523257

RESUMO

The purpose of this two-part paper is to introduce researchers to the many-facet Rasch measurement (MFRM) approach for detecting and measuring rater effects. The researcher will learn how to use the Facets (Linacre, 2001) computer program to study five effects: leniency/severity, central tendency, randomness, halo, and differential leniency/severity. Part 1 of the paper provides critical background and context for studying MFRM. We present a catalog of rater effects, introducing effects that researchers have studied over the last three-quarters of a century in order to help readers gain a historical perspective on how those effects have been conceptualized. We define each effect and describe various ways the effect has been portrayed in the research literature. We then explain how researchers theorize that the effect impacts the quality of ratings, pinpoint various indices they have used to measure it, and describe various strategies that have been proposed to try to minimize its impact on the measurement of ratees. The second half of Part 1 provides conceptual and mathematical explanations of many-facet Rasch measurement, focusing on how researchers can use MFRM to study rater effects. First, we present the many-facet version of Andrich's (1978) rating scale model and identify questions about a rating operation that researchers can address using this model. We then introduce three hybrid MFRM models, explain the conceptual distinctions among them, describe how they differ from the rating scale model, and identify questions about a rating operation that researchers can address using these hybrid models.

Assuntos

Modelos Estatísticos , Pesquisa/normas , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Pesquisa/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA