Search | VHL Search Portal

DRIFT happens, sometimes: Examining time based rater variance in a high-stakes OSCE.

Coetzee, Karen; Monteiro, Sandra.

Med Teach ; 41(7): 819-823, 2019 07.

Article in English | MEDLINE | ID: mdl-30955444

ABSTRACT

Introduction: Examiner based variance can affect test taker outcomes. The aim of this study was to investigate the examiner-based effect of DRIFT or differential rater functioning over time. Methods: Average station level scores from five administrations of the same version of a highstakes 12-station OSCE were analyzed for the presence of DRIFT. Results: Test-takers who were scored earlier appeared to receive a score advantage, while those who were scored later, appeared to receive neither a score advantage nor disadvantage due to the DRIFT behavior. A specific form of DRIFT, primacy (the assignment of progressively harsher scores), was present in one out of the 228 examiner scoring opportunities investigated in this study. In other words, less than 1% of the examiner scoring that took place displayed significant levels of DRIFT scoring behavior. Discussion and Conclusions: The noted score advantage influenced the test outcomes of only one examinee who performed close to the cut-score on all other stations. Prior publications report broader effects of DRIFT, but the current assessment context, particularly access to examiner training, may have had a modulating effect in the present study.

Subject(s)

Education, Medical/organization & administration , Educational Measurement/standards , Observer Variation , Clinical Competence , Education, Medical/standards , Humans , Reproducibility of Results , Time Factors

A many-facet Rasch measurement model approach to investigating objective structured clinical examination item parameter drift.

Coetzee, Karen; Monteiro, Sandra; Amirthalingam, Luxshi.

J Eval Clin Pract ; 2024 Jul 29.

Article in English | MEDLINE | ID: mdl-39073068

ABSTRACT

RATIONALE: Objective Structured Clinical Examinations (OSCEs) are widely used for assessing clinical competence, especially in high-stakes environments such as medical licensure. However, the reuse of OSCE cases across multiple administrations raises concerns about parameter stability, known as item parameter drift (IPD). AIMS & OBJECTIVES: This study aims to investigate IPD in reused OSCE cases while accounting for examiner scoring effects using a Many-facet Rasch Measurement (MFRM) model. METHOD: Data from 12 OSCE cases, reused over seven administrations of the Internationally Educated Nurse Competency Assessment Program (IENCAP), were analyzed using the MFRM model. Each case was treated as an item, and examiner scoring effects were accounted for in the analysis. RESULTS: The results indicated that despite accounting for examiner effects, all cases exhibited some level of IPD, with an average absolute IPD of 0.21 logits. Three cases showed positive directional trends. IPD significantly affected score decisions in 1.19% of estimates, at an invariance violation of 0.58 logits. CONCLUSION: These findings suggest that while OSCE cases demonstrate sufficient stability for reuse, continuous monitoring is essential to ensure the accuracy of score interpretations and decisions. The study provides an objective threshold for detecting concerning levels of IPD and underscores the importance of addressing examiner scoring effects in OSCE assessments. The MFRM model offers a robust framework for tracking and mitigating IPD, contributing to the validity and reliability of OSCEs in evaluating clinical competence.

OSCE Standard Setting: Three Borderline Group Methods.

Smee, Sydney; Coetzee, Karen; Bartman, Ilona; Roy, Marguerite; Monteiro, Sandra.

Med Sci Educ ; 32(6): 1439-1445, 2022 Dec.

Article in English | MEDLINE | ID: mdl-36532388

ABSTRACT

High-stakes assessments must discriminate between examinees who are sufficiently competent to practice in the health professions and examinees who are not. In these settings, criterion-referenced standard-setting methods are strongly preferred over norm referenced methods. While there are many criterion-referenced options, few are feasible or cost effective for objective structured clinical examinations (OSCEs). The human and financial resources required to organize OSCEs alone are often significant, leaving little in an institution's budget for additional resource-intensive standard-setting methods. The modified borderline group method introduced by Dauphinee et al. for a large-scale, multi-site OSCE is a very feasible option but is not as defensible for smaller scale OSCEs. This study compared the modified borderline group method to two adaptations that address its limitations for smaller scale OSCEs while retaining its benefits, namely feasibility. We evaluated decision accuracy and consistency of calculated cut scores derived from (1) modified, (2) regression-based, and (3) 4-facet Rasch model borderline group methods. Data were from a 12-station OSCE that assessed 112 nurses for entry to practice in a Canadian context. The three cut scores (64-65%) all met acceptable standards of accuracy and consistency; however, the modified borderline group method was the most influenced by lower scores within the borderline group, leading to the lowest cut score. The two adaptations may be more defensible than modified BGM in the context of a smaller (n < 100-150) OSCE.

i-Assess: Evaluating the impact of electronic data capture for OSCE.

Monteiro, Sandra; Sibbald, Debra; Coetzee, Karen.

Perspect Med Educ ; 7(2): 110-119, 2018 04.

Article in English | MEDLINE | ID: mdl-29488098

ABSTRACT

INTRODUCTION: Tablet-based assessments offer benefits over scannable-paper assessments; however, there is little known about the impact to the variability of assessment scores. METHODS: Two studies were conducted to evaluate changes in rating technology. Rating modality (paper vs tablets) was manipulated between candidates (Study 1) and within candidates (Study 2). Average scores were analyzed using repeated measures ANOVA, Cronbach's alpha and generalizability theory. Post-hoc analyses included a Rasch analysis and McDonald's omega. RESULTS: Study 1 revealed a main effect of modality (F (1,152)â¯= 25.06, pâ¯< 0.01). Average tablet-based scores were higher, (3.39/5, 95% CIâ¯= 3.28 to 3.51), compared with average scan-sheet scores (3.00/5, 95% CIâ¯= 2.90 to 3.11). Study 2 also revealed a main effect of modality (F (1, 88)â¯= 15.64, pâ¯< 0.01), however, the difference was reduced to 2% with higher scan-sheet scores (3.36, 95% CIâ¯= 3.30 to 3.42) compared with tablet scores (3.27, 95% CIâ¯= 3.21 to 3.33). Internal consistency (alpha and omega) remained high (>0.8) and inter-station reliability remained constant (0.3). Rasch analyses showed no relationship between station difficulty and rating modality. DISCUSSION: Analyses of average scores may be misleading without an understanding of internal consistency and overall reliability of scores. Although updating to tablet-based forms did not result in systematic variations in scores, routine analyses ensured accurate interpretation of the variability of assessment scores. CONCLUSION: This study demonstrates the importance of ongoing program evaluation and data analysis.

Subject(s)

Computers, Handheld/standards , Test Taking Skills/standards , Analysis of Variance , Educational Measurement/methods , Equipment Design/standards , Humans , Psychometrics/instrumentation , Psychometrics/methods , Qualitative Research , Reproducibility of Results , Test Taking Skills/methods

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL