Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
J Appl Meas ; 19(3): 243-257, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30169333

RESUMEN

Previous research has shown that indices obtained from partial credit model (PCM) estimates can detect severity and centrality rater effects, though it remains unknown how rater effect detection is impacted by the missingness inherent in double-scoring rating designs. This simulation study evaluated the impact of missing data on rater severity and centrality detection. Data were generated for each rater effect type, which varied in rater pool quality, rater effect prevalence and magnitude, and extent of missingness. Raters were flagged using rater location as a severity indicator and the standard deviation of rater thresholds a centrality indicator. Two methods of identifying extreme scores on these indices were compared. Results indicate that both methods result in low Type I and Type II error rates (i.e., incorrectly flagging non-effect raters and not flagging effect raters) and that the presence of missing data has negligible impact on the detection of severe and central raters.


Asunto(s)
Modelos Estadísticos , Psicometría/métodos , Recolección de Datos , Humanos , Reproducibilidad de los Resultados , Investigación
2.
Educ Psychol Meas ; 76(6): 1005-1025, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29795898

RESUMEN

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy ratings (0 = inaccurate, 1 = accurate) are unfolded into three latent categories: inaccurate below expert ratings, accurate ratings, and inaccurate above expert ratings. The hyperbolic cosine model (HCM) is used to examine dichotomous accuracy ratings from a statewide writing assessment. This study suggests that HCM is a promising approach for examining rater accuracy, and that the HCM can provide a useful interpretive framework for evaluating the quality of ratings obtained within the context of rater-mediated assessments.

3.
J Appl Meas ; 16(2): 153-60, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26075664

RESUMEN

Engelhard (1996) proposed a rater accuracy model (RAM) as a means of evaluating rater accuracy in rating data, but very little research exists to determine the efficacy of that model. The RAM requires a transformation of the raw score data to accuracy measures by comparing rater-assigned scores to true scores. Indices computed based on raw scores also exist for measuring rater effects, but these indices ignore deviations of rater-assigned scores from true scores. This paper demonstrates the efficacy of two versions of the RAM (based on dichotomized and polytomized deviations of rater-assigned scores from true scores) to two versions of raw score rater effect models (i.e., a Rasch partial credit model, PCM, and a Rasch rating scale model, RSM). Simulated data are used to demonstrate the efficacy with which these four models detect and differentiate three rater effects: severity, centrality, and inaccuracy. Results indicate that the RAMs are able to detect, but not differentiate, rater severity and inaccuracy, but not rater centrality. The PCM and RSM, on the other hand, are able to both detect and differentiate all three of these rater effects. However, the RSM and PCM do not take into account true scores and may, therefore, be misleading when pervasive trends exist in the rater-assigned data.


Asunto(s)
Recolección de Datos/normas , Interpretación Estadística de Datos , Evaluación Educacional , Modelos Teóricos , Evaluación Educacional/estadística & datos numéricos
4.
Educ Psychol Meas ; 75(1): 102-125, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-29795814

RESUMEN

This report summarizes an empirical study that addresses two related topics within the context of writing assessment-illusory halo and how much unique information is provided by multiple analytic scores. Specifically, we address the issue of whether unique information is provided by analytic scores assigned to student writing, beyond what is depicted by holistic scores, and to what degree multiple analytic scores assigned by a single rater display evidence of illusory halo. To that end, we analyze student responses to an expository writing prompt that were scored by six groups of raters-four groups assigned single analytic scores, one group assigned multiple analytic scores, and one group assigned holistic scores-using structural equation modeling. Our results suggest that there is evidence of illusory halo when raters assign multiple analytic scores to a single student response and that, at best, only two factors seem to be distinguishable in analytic writing scores assigned to expository essays.

5.
J Appl Meas ; 16(3): 228-41, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26753219

RESUMEN

To date, much of the research concerning rater effects has focused on rater severity/leniency. Consequently, other potentially important rater effects have largely ignored by those conducting operational scoring projects. This simulation study compares four rater centrality indices (rater fit, residual-expected correlations, rater slope, and rater threshold variance) in terms of their Type I and Type II error rates under varying levels of centrality magnitude, centrality pervasiveness, and rating scale construction when each of four latent trait models is fitted to the simulated data (Rasch rating scale and partial credit models and the generalized rating scale and partial credit models). Results indicate that the residual-expected correlation may be most appropriately sensitive to rater centrality under most conditions.


Asunto(s)
Modelos Estadísticos , Reproducibilidad de los Resultados , Investigación
6.
J Appl Meas ; 15(3): 267-75, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24992250

RESUMEN

Previous research has investigated the influence of sample size, model misspecification, test length, ability distribution offset, and generating model on the likelihood ratio difference test in applications of item response models. This study extended that research to the evaluation of dimensionality using the multidimensional random coefficients multinomial logit model (MRCMLM). Logistic regression analysis of simulated data reveal that sample size and test length have a large effect on the capacity of the LR difference test to correctly identify unidimensionality, with shorter tests and smaller sample sizes leading to smaller Type I error rates. Higher levels of simulated misfit resulted in fewer incorrect decisions than data with no or little misfit. However, Type I error rates indicate that the likelihood ratio difference test is not suitable under any of the simulated conditions for evaluating dimensionality in applications of the MRCMLM.


Asunto(s)
Funciones de Verosimilitud , Modelos Logísticos , Modelos Estadísticos , Pruebas Psicológicas/estadística & datos numéricos , Psicometría/estadística & datos numéricos , Simulación por Computador , Humanos , Cómputos Matemáticos , Tamaño de la Muestra
7.
J Appl Meas ; 15(2): 152-9, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24950533

RESUMEN

A large body of literature exists describing how rater effects may be detected in rating data. In this study, we compared the flag and agreement rates for several rater effects based on calibration of a real data under two psychometric models-the Rasch rating scale model (RSM) and the Rasch testlet-based rater bundle model (RBM). The results show that the RBM provided more accurate diagnoses of rater severity and leniency than do the RSM which is based on the local independence assumption. However, the statistical indicators associated with rater centrality and inaccuracy remain consistent between these two models.


Asunto(s)
Interpretación Estadística de Datos , Evaluación Educacional/estadística & datos numéricos , Modelos Estadísticos , Psicometría/estadística & datos numéricos , Sesgo , Recolección de Datos/estadística & datos numéricos , Humanos , Capacitación en Servicio , Variaciones Dependientes del Observador , Reproducibilidad de los Resultados
8.
J Appl Meas ; 14(4): 332-8, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24064575

RESUMEN

Cognitive radios (CRs) are recent technological developments that rely on artificial intelligence to adapt a radio's performance to suit environmental demands, such as sharing radio frequencies with other radios. Measuring the performance of the cognitive engines (CEs) that underlie a CR's performance is a challenge for those developing CR technology. This simulation study illustrates how the Rasch model can be applied to the evaluation of CRs. We simulated the responses of 50 CEs to 35 performance tasks and applied the Random Coefficients Multidimensional Multinomial Logit Model (MRCMLM) to those data. Our results indicate that CEs based on different algorithms may exhibit differential performance across manipulated performance task parameters. We found that a multidimensional mixture model may provide the best fit to the simulated data and that the two algorithms simulated may respond to tasks that emphasize achieving high levels of data throughput coupled with lower emphasis on power conservation differently than they do to other combinations of performance task characteristics.


Asunto(s)
Inteligencia Artificial , Cognición , Interpretación Estadística de Datos , Análisis de Falla de Equipo/métodos , Análisis de Falla de Equipo/estadística & datos numéricos , Modelos Estadísticos , Telecomunicaciones/instrumentación , Algoritmos
9.
J Appl Meas ; 14(1): 1-9, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23442324

RESUMEN

Historically, rule-of-thumb critical values have been employed for interpreting fit statistics that depict anomalous person and item response patterns in applications of the Rasch model. Unfortunately, prior research has shown that these values are not appropriate in many contexts. This article introduces a bootstrap procedure for identifying reasonable critical values for Rasch fit statistics and compares the results of that procedure to applications of rule-of-thumb critical values for three example datasets. The results indicate that rule-of-thumb values may over- or under-identify the number of misfitting items or persons.


Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Modificador del Efecto Epidemiológico , Análisis por Apareamiento , Modelos Estadísticos , Psicometría/métodos , Estadística como Asunto , Simulación por Computador , Humanos
10.
J Acad Nutr Diet ; 112(7): 1042-7, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22542265

RESUMEN

Manager attitude is influential in female employees' perceptions of workplace breastfeeding support. Currently, no instrument is available to assess manager attitude toward supporting women who wish to combine breastfeeding with work. We developed and piloted an instrument to measure manager attitudes toward workplace breastfeeding support entitled the "Managers' Attitude Toward Breastfeeding Support Questionnaire," an instrument that measures four constructs using 60 items that are rated agree/disagree on a 4-point Likert rating scale. We established the content validity of the Managers' Attitude Toward Breastfeeding Support Questionnaire measures through expert content review (n=22), expert assessment of item fit (n=11), and cognitive interviews (n=8). Data were collected from a purposive sample of 185 front-line managers who had experience supervising female employees, and responses were scaled using the Multidimensional Random Coefficients Multinomial Logit Model. Dimensionality analyses supported the proposed four-construct model. Reliability ranged from 0.75 to 0.86, and correlations between the constructs were moderately strong (0.47 to 0.71). Four items in two constructs exhibited model-to-data misfit and/or a low score-measure correlation. One item was revised and the other three items were retained in the Managers' Attitude Toward Breastfeeding Support Questionnaire. Findings of this study suggest that the Managers' Attitude Toward Breastfeeding Support Questionnaire measures are reliable and valid indicators of manager attitude toward workplace breastfeeding support, and future research should be conducted to establish external validity. The Managers' Attitude Toward Breastfeeding Support Questionnaire could be used to collect data in a standardized manner within and across companies to measure and compare manager attitudes toward supporting breastfeeding. Organizations can subsequently develop targeted strategies to improve support for breastfeeding employees through efforts influencing managerial attitude.


Asunto(s)
Personal Administrativo/psicología , Lactancia Materna/psicología , Madres/psicología , Apoyo Social , Encuestas y Cuestionarios/normas , Personal Administrativo/estadística & datos numéricos , Adulto , Actitud , Recolección de Datos/métodos , Recolección de Datos/normas , Femenino , Humanos , Relaciones Interprofesionales , Masculino , Madres/estadística & datos numéricos , Cultura Organizacional , Política Organizacional , Percepción , Proyectos Piloto , Mujeres Trabajadoras/psicología , Mujeres Trabajadoras/estadística & datos numéricos , Lugar de Trabajo/psicología , Lugar de Trabajo/estadística & datos numéricos
11.
J Appl Meas ; 12(3): 212-21, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22357124

RESUMEN

This paper compares the results of applications of the Multidimensional Random Coefficients Multinomial Logit Model (MRCMLM) to comparable Structural Equation Model (SEM) applications for the purpose of conducting a Confirmatory Factor Analysis (CFA). We review SEM as it is applied to CFA, identify some parallels between the MRCMLM approach to CFA and that utilized in a standard SEM CFA, and illustrate the comparability of MRCMLM and SEM CFA results for three datasets. Results indicate that the two approaches tend to identify similar dimensional models as exhibiting best fit and provide comparable depictions of latent variable correlations, but the two procedures depict the reliability of measures differently.


Asunto(s)
Análisis Factorial , Modelos Estadísticos , Psicometría , Humanos , Reproducibilidad de los Resultados , Proyectos de Investigación
12.
J Appl Meas ; 12(4): 358-69, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-22357157

RESUMEN

This article summarizes a simulation study of the performance of five item quality indicators (the weighted and unweighted versions of the mean square and standardized mean square fit indices and the point-measure correlation) under conditions of relatively high and low amounts of missing data under both random and conditional patterns of missing data for testing contexts such as those encountered in operational administrations of a computerized adaptive certification or licensure examination. The results suggest that weighted fit indices, particularly the standardized mean square index, and the point-measure correlation provide the most consistent information between random and conditional missing data patterns and that these indices perform more comparably for items near the passing score than for items with extreme difficulty values.


Asunto(s)
Sesgo , Simulación por Computador , Interpretación Estadística de Datos , Licencia en Enfermería/estadística & datos numéricos , Cómputos Matemáticos , Programas Informáticos , Gráficos por Computador , Humanos , Modelos Lineales , Reproducibilidad de los Resultados , Tamaño de la Muestra
13.
J Appl Meas ; 11(2): 142-57, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20693699

RESUMEN

This article reports the results of an application of the Multidimensional Random Coefficients Multinomial Logit Model (MRCMLM) to the measurement of professional development activities in which community college administrators participate. The analyses focus on confirmation of the factorial structure of the instrument, evaluation of the quality of the activities calibrations, examination of the internal structure of the instrument, and comparison of groups of administrators. The dimensionality analysis results suggest a five-dimensional model that is consistent with previous literature concerning career paths of community college administrators - education and specialized training, internal professional development and mentoring, external professional development, employer support, and seniority. The indicators of the quality of the activity calibrations suggest that measures of the five dimensions are adequately reliable, that the activities in each dimension are internally consistent, and that the observed responses to each activity are consistent with the expected values of the MRCMLM. The hierarchy of administrator measure means and of activity calibrations is consistent with substantive theory relating to professional development for community college administrators. For example, readily available activities that occur at the institution were most likely to be engaged in by administrators, while participation in selective specialized training institutes were the least likely activities. Finally, group differences with respect to age and title were consistent with substantive expectations - the greater the administrator's age and the higher the rank of the administrator's title, the greater the probability of having engaged in various types of professional development.


Asunto(s)
Personal Administrativo , Instituciones Académicas/organización & administración , Personal Administrativo/educación , Personal Administrativo/estadística & datos numéricos , Humanos , Liderazgo , Mentores , Modelos Estadísticos , Rol Profesional , Instituciones Académicas/estadística & datos numéricos , Sociedades , Estados Unidos
14.
J Appl Meas ; 11(2): 182-95, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20693702

RESUMEN

The development of alternate forms of tests requires a statistical score adjustment called equating that permits the interchanging of scores from different test forms. Equating makes possible several important measurement applications, including removing practice effects in pretest-posttest research designs, improving test security, comparing scores between new and old forms, and supporting item bank development for computerized adaptive testing. This article summarizes equating methods from a Rasch measurement perspective. The four sections of this article present an introduction and definition of equating and related linking methods, data collection designs, equating procedures, and evaluating equated measures. The methods are illustrated with worked examples.


Asunto(s)
Evaluación Educacional/estadística & datos numéricos , Modelos Estadísticos , Bioestadística , Recolección de Datos/métodos , Humanos
15.
J Appl Meas ; 10(3): 335-47, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19671993

RESUMEN

This article describes how the multi-faceted Rasch model (MFRM) can be applied to item and rater analysis and the types of information that is made available by a multifaceted analysis of constructed-response items. Particularly, the text describes evidence that is made available by such analyses that is relevant to improving item and rubric development as well as rater training and monitoring. The article provides an introduction to MRFM extensions of the family of Rasch models, a description of item analysis procedures, a description of rater analysis procedures, and concludes with an example analysis conducted using a commercially available program that implements the MFRM, Facets.


Asunto(s)
Evaluación Educacional/estadística & datos numéricos , Modelos Estadísticos , Psicometría , Sesgo , Evaluación Educacional/métodos , Humanos
16.
J Appl Meas ; 10(2): 196-207, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19564699

RESUMEN

This article summarizes multidimensional Rasch analyses in which several alternative models of genetics reasoning are evaluated based on item response data from secondary students who participated in a genetics reasoning curriculum. The various depictions of genetics reasoning are compared by fitting several models to the item response data and comparing data-to-model fit at the model level between hierarchically nested models. We conclude that two two-dimensional models provide a substantively better depiction of student performance than does a unidimensional model or more complex three- and four-dimensional models.


Asunto(s)
Cognición , Genética/educación , Modelos Estadísticos , Adolescente , Evaluación Educacional/estadística & datos numéricos , Humanos
17.
Res Q Exerc Sport ; 79(3): 300-11, 2008 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18816941

RESUMEN

This study extended validity evidence for measures of coaching efficacy derived from the Coaching Efficacy Scale (CES) by testing the rating scale categorizations suggested in previous research. Previous research provided evidence for the effectiveness of a four-category (4-CAT) structure for high school and collegiate sports coaches; it also suggested that a five-category (5-CAT) structure may be effective for youth sports coaches, because they may be more likely to endorse categories on the lower end of the scale. Coaches of youth sports (N = 492) responded to the CES items with a 5-CAT structure. Across rating scale category effectiveness guidelines, 32 of 34 evidences (94%) provided support for this structure. Data were condensed to a 4-CAT structure by collapsing responses in Category 1 (CAT-1) and Category 2 (CAT-2). Across rating scale category effectiveness guidelines, 25 of 26 evidences (96%) provided support for this structure. Findings provided confirmatory, cross-validation evidence for both the 5-CAT and 4-CAT structures. For empirical, theoretical, and practical reasons, the authors concluded that the 4-CAT structure was preferable to the 5-CAT when CES items are used to measure coaching efficacy. This conclusion is based on the findings of this confirmatory study and the more exploratory findings of Myers, Wolfe, and Feltz (2005).


Asunto(s)
Ejercicio Físico , Educación y Entrenamiento Físico/métodos , Psicometría/métodos , Deportes/educación , Adulto , Niño , Humanos , Masculino , Mentores , Modelos Estadísticos , Competencia Profesional , Reproducibilidad de los Resultados , Autoeficacia , Deportes/fisiología , Deportes/psicología
18.
Breastfeed Med ; 3(3): 159-63, 2008 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18778210

RESUMEN

OBJECTIVE: Breastfeeding rates among working mothers are lower than among mothers who are not employed. An ecological framework suggests that health behaviors, such as breastfeeding, are influenced by intrapersonal and environmental factors. There is no existing instrument to measure women's perception of the workplace environment in providing breastfeeding support. The objective of this study was to pilot an instrument measuring perceptions of the work climate for breastfeeding support among working women. STUDY DESIGN: Data were collected from self-administered mailed questionnaires filled out by 104 pregnant women or women who had recently given birth and were employed and breastfeeding. RESULTS: Dimensionally analyses supported the two-dimensional model suggested by the literature. Internal consistency reliability coefficients were high (near 0.90), and the correlation between the subscales was moderately strong (0.68). CONCLUSIONS: Only a single item exhibited misfit to the scaling model, and that item was revised after review.


Asunto(s)
Lactancia Materna/psicología , Recolección de Datos/normas , Madres/psicología , Apoyo Social , Encuestas y Cuestionarios/normas , Mujeres Trabajadoras/psicología , Adulto , Lactancia Materna/estadística & datos numéricos , Recolección de Datos/instrumentación , Recolección de Datos/métodos , Femenino , Humanos , Entrevistas como Asunto , Madres/estadística & datos numéricos , Cultura Organizacional , Política Organizacional , Percepción , Mujeres Trabajadoras/estadística & datos numéricos , Lugar de Trabajo
19.
J Appl Meas ; 8(2): 204-34, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17440262

RESUMEN

Accumulation of validity evidence is an important part of the instrument development process. In Part I of a two-part series, we provided an overview of validity concepts and described how instrument development efforts can be conducted to facilitate the development of validity arguments. In this, Part II of the series, we identify how analyses, especially those conducted within a Rasch measurement framework, can be used to provide evidence to support validity arguments that are founded during the instrument development process.


Asunto(s)
Interpretación Estadística de Datos , Modelos Estadísticos , Humanos
20.
J Appl Meas ; 8(1): 97-123, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17215568

RESUMEN

Instrument development is an arduous task that, if undertaken with care and consideration, can lay the foundation for the development of validity arguments relating to the inferences and decisions that are based on test measures. This article, Part I of a two-part series, provides an overview of validity concepts and describes how instrument development efforts can be conducted to facilitate the development of validity arguments. Our discussion focuses on documentation of the purpose of measurement, creation of test specifications, item development, expert review, and planning of pilot studies. Through these instrument development activities and tools, essential information is documented that will feed into the analysis, summary, and reporting of data relevant to validity arguments discussed in Part II of this series.


Asunto(s)
Evaluación Educacional , Modelos Psicológicos , Psicología/métodos , Cultura , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA