Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Med Teach ; : 1-9, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38976711

RESUMO

INTRODUCTION: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation. METHODS: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school. RESULTS: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4). DISCUSSION: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.

2.
Med Teach ; : 1-9, 2024 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-38635469

RESUMO

INTRODUCTION: Whilst rarely researched, the authenticity with which Objective Structured Clinical Exams (OSCEs) simulate practice is arguably critical to making valid judgements about candidates' preparedness to progress in their training. We studied how and why an OSCE gave rise to different experiences of authenticity for different participants under different circumstances. METHODS: We used Realist evaluation, collecting data through interviews/focus groups from participants across four UK medical schools who participated in an OSCE which aimed to enhance authenticity. RESULTS: Several features of OSCE stations (realistic, complex, complete cases, sufficient time, autonomy, props, guidelines, limited examiner interaction etc) combined to enable students to project into their future roles, judge and integrate information, consider their actions and act naturally. When this occurred, their performances felt like an authentic representation of their clinical practice. This didn't work all the time: focusing on unavoidable differences with practice, incongruous features, anxiety and preoccupation with examiners' expectations sometimes disrupted immersion, producing inauthenticity. CONCLUSIONS: The perception of authenticity in OSCEs appears to originate from an interaction of station design with individual preferences and contextual expectations. Whilst tentatively suggesting ways to promote authenticity, more understanding is needed of candidates' interaction with simulation and scenario immersion in summative assessment.

3.
BMC Med Educ ; 23(1): 803, 2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37885005

RESUMO

PURPOSE: Ensuring equivalence of examiners' judgements within distributed objective structured clinical exams (OSCEs) is key to both fairness and validity but is hampered by lack of cross-over in the performances which different groups of examiners observe. This study develops a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA) using it to compare examiners scoring from different OSCE sites for the first time. MATERIALS/ METHODS: Within a summative 16 station OSCE, volunteer students were videoed on each station and all examiners invited to score station-specific comparator videos in addition to usual student scoring. Linkage provided through the video-scores enabled use of Many Facet Rasch Modelling (MFRM) to compare 1/ examiner-cohort and 2/ site effects on students' scores. RESULTS: Examiner-cohorts varied by 6.9% in the overall score allocated to students of the same ability. Whilst only a tiny difference was apparent between sites, examiner-cohort variability was greater in one site than the other. Adjusting student scores produced a median change in rank position of 6 places (0.48 deciles), however 26.9% of students changed their rank position by at least 1 decile. By contrast, only 1 student's pass/fail classification was altered by score adjustment. CONCLUSIONS: Whilst comparatively limited examiner participation rates may limit interpretation of score adjustment in this instance, this study demonstrates the feasibility of using VESCA for quality assurance purposes in large scale distributed OSCEs.


Assuntos
Avaliação Educacional , Estudantes de Medicina , Humanos , Avaliação Educacional/métodos , Competência Clínica
4.
Med Educ ; 56(3): 292-302, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34893998

RESUMO

INTRODUCTION: Differential rater function over time (DRIFT) and contrast effects (examiners' scores biased away from the standard of preceding performances) both challenge the fairness of scoring in objective structured clinical exams (OSCEs). This is important as, under some circumstances, these effects could alter whether some candidates pass or fail assessments. Benefitting from experimental control, this study investigated the causality, operation and interaction of both effects simultaneously for the first time in an OSCE setting. METHODS: We used secondary analysis of data from an OSCE in which examiners scored embedded videos of student performances interspersed between live students. Embedded video position varied between examiners (early vs. late) whilst the standard of preceding performances naturally varied (previous high or low). We examined linear relationships suggestive of DRIFT and contrast effects in all within-OSCE data before comparing the influence and interaction of 'early' versus 'late' and 'previous high' versus 'previous low' conditions on embedded video scores. RESULTS: Linear relationships data did not support the presence of DRIFT or contrast effects. Embedded videos were scored higher early (19.9 [19.4-20.5]) versus late (18.6 [18.1-19.1], p < 0.001), but scores did not differ between previous high and previous low conditions. The interaction term was non-significant. CONCLUSIONS: In this instance, the small DRIFT effect we observed on embedded videos can be causally attributed to examiner behaviour. Contrast effects appear less ubiquitous than some prior research suggests. Possible mediators of these finding include the following: OSCE context, detail of task specification, examiners' cognitive load and the distribution of learners' ability. As the operation of these effects appears to vary across contexts, further research is needed to determine the prevalence and mechanisms of contrast and DRIFT effects, so that assessments may be designed in ways that are likely to avoid their occurrence. Quality assurance should monitor for these contextually variable effects in order to ensure OSCE equivalence.


Assuntos
Competência Clínica , Avaliação Educacional , Humanos
5.
Med Teach ; 44(6): 664-671, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35000530

RESUMO

INTRODUCTION: Providing high-quality feedback from Objective Structured Clinical Exams (OSCEs) is important but challenging. Whilst prior research suggests that video-based feedback (VbF), where students review their own performances alongside usual examiner feedback, may usefully enhance verbal or written feedback, little is known about how students experience or interact with VbF or what mechanisms may underly any such benefits. METHODS: We used social constructive grounded theory to explore students' interaction with VbF. Within semi-structured interviews, students reviewed their verbal feedback from examiners before watching a video of the same performance, reflecting with the interviewer before and after the video. Transcribed interviews were analysed using grounded theory analysis methods. RESULT: Videos greatly enhanced students' memories of their performance, which increased their receptivity to and the credibility of examiners' feedback. Reflecting on video performances produced novel insights for students beyond the points described by examiners. Students triangulated these novel insights with their own self-assessment and experiences from practice to reflect deeply on their performance which led to the generation of additional, often patient-orientated, learning objectives. CONCLUSIONS: The array of beneficial mechanisms evoked by VbF suggests it may be a powerful means to richly support students' learning in both formative and summative contexts.


Assuntos
Educação de Graduação em Medicina , Estudantes de Medicina , Competência Clínica , Educação de Graduação em Medicina/métodos , Avaliação Educacional/métodos , Retroalimentação , Humanos
6.
Med Teach ; 44(8): 836-850, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35771684

RESUMO

INTRODUCTION: In 2011, a consensus report was produced on technology-enhanced assessment (TEA), its good practices, and future perspectives. Since then, technological advances have enabled innovative practices and tools that have revolutionised how learners are assessed. In this updated consensus, we bring together the potential of technology and the ultimate goals of assessment on learner attainment, faculty development, and improved healthcare practices. METHODS: As a material for the report, we used the scholarly publications on TEA in both HPE and general higher education, feedback from 2020 Ottawa Conference workshops, and scholarly publications on assessment technology practices during the Covid-19 pandemic. RESULTS AND CONCLUSION: The group identified areas of consensus that remained to be resolved and issues that arose in the evolution of TEA. We adopted a three-stage approach (readiness to adopt technology, application of assessment technology, and evaluation/dissemination). The application stage adopted an assessment 'lifecycle' approach and targeted five key foci: (1) Advancing authenticity of assessment, (2) Engaging learners with assessment, (3) Enhancing design and scheduling, (4) Optimising assessment delivery and recording learner achievement, and (5) Tracking learner progress and faculty activity and thereby supporting longitudinal learning and continuous assessment.


Assuntos
COVID-19 , Pandemias , Currículo , Humanos , Aprendizagem , Tecnologia
7.
BMC Med Educ ; 22(1): 41, 2022 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-35039023

RESUMO

BACKGROUND: Ensuring equivalence of examiners' judgements across different groups of examiners is a priority for large scale performance assessments in clinical education, both to enhance fairness and reassure the public. This study extends insight into an innovation called Video-based Examiner Score Comparison and Adjustment (VESCA) which uses video scoring to link otherwise unlinked groups of examiners. This linkage enables comparison of the influence of different examiner-groups within a common frame of reference and provision of adjusted "fair" scores to students. Whilst this innovation promises substantial benefit to quality assurance of distributed Objective Structured Clinical Exams (OSCEs), questions remain about how the resulting score adjustments might be influenced by the specific parameters used to operationalise VESCA. Research questions, How similar are estimates of students' score adjustments when the model is run with either: fewer comparison videos per participating examiner?; reduced numbers of participating examiners? METHODS: Using secondary analysis of recent research which used VESCA to compare scoring tendencies of different examiner groups, we made numerous copies of the original data then selectively deleted video scores to reduce the number of 1/ linking videos per examiner (4 versus several permutations of 3,2,or 1 videos) or 2/examiner participation rates (all participating examiners (76%) versus several permutations of 70%, 60% or 50% participation). After analysing all resulting datasets with Many Facet Rasch Modelling (MFRM) we calculated students' score adjustments for each dataset and compared these with score adjustments in the original data using Spearman's correlations. RESULTS: Students' score adjustments derived form 3 videos per examiner correlated highly with score adjustments derived from 4 linking videos (median Rho = 0.93,IQR0.90-0.95,p < 0.001), with 2 (median Rho 0.85,IQR0.81-0.87,p < 0.001) and 1 linking videos (median Rho = 0.52(IQR0.46-0.64,p < 0.001) producing progressively smaller correlations. Score adjustments were similar for 76% participating examiners and 70% (median Rho = 0.97,IQR0.95-0.98,p < 0.001), and 60% (median Rho = 0.95,IQR0.94-0.98,p < 0.001) participation, but were lower and more variable for 50% examiner participation (median Rho = 0.78,IQR0.65-0.83, some ns). CONCLUSIONS: Whilst VESCA showed some sensitivity to the examined parameters, modest reductions in examiner participation rates or video numbers produced highly similar results. Employing VESCA in distributed or national exams could enhance quality assurance or exam fairness.


Assuntos
Avaliação Educacional , Estudantes de Medicina , Competência Clínica , Humanos , Julgamento
8.
Med Teach ; 43(9): 1070-1078, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34496725

RESUMO

INTRODUCTION: Communication skills are assessed by medically-enculturated examiners using consensus frameworks which were developed with limited patient involvement. Assessments consequently risk rewarding performance which incompletely serves patients' authentic communication needs. Whilst regulators require patient involvement in assessment, little is known about how this can be achieved. We aimed to explore patients' perceptions of students' communication skills, examiner feedback and potential roles for patients in assessment. METHODS: Using constructivist grounded theory we performed cognitive stimulated, semi-structured interviews with patients who watched videos of student performances in communication-focused OSCE stations and read corresponding examiner feedback. Data were analysed using grounded theory methods. RESULTS: A disconnect occurred between participants' and examiners' views of students' communication skills. Whilst patients frequently commented on students' use of medical terminology, examiners omitted to mention this in feedback. Patients' judgements of students' performances varied widely, reflecting different preferences and beliefs. Participants viewed variability as an opportunity for students to learn from diverse lived experiences. Participants perceived a variety of roles to enhance assessment authenticity. DISCUSSION: Integrating patients into communications skills assessments could help to highlight deficiencies in students' communication which medically-enculturated examiners may miss. Overcoming the challenges inherent to this is likely to enhance graduates' preparedness for practice.


Assuntos
Participação do Paciente , Estudantes de Medicina , Competência Clínica , Comunicação , Avaliação Educacional , Humanos
9.
Med Teach ; 42(11): 1250-1260, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32749915

RESUMO

INTRODUCTION: Novel uses of video aim to enhance assessment in health-professionals education. Whilst these uses presume equivalence between video and live scoring, some research suggests that poorly understood variations could challenge validity. We aimed to understand examiners' and students' interaction with video whilst developing procedures to promote its optimal use. METHODS: Using design-based research we developed theory and procedures for video use in assessment, iteratively adapting conditions across simulated OSCE stations. We explored examiners' and students' perceptions using think-aloud, interviews and focus group. Data were analysed using constructivist grounded-theory methods. RESULTS: Video-based assessment produced detachment and reduced volitional control for examiners. Examiners ability to make valid video-based judgements was mediated by the interaction of station content and specifically selected filming parameters. Examiners displayed several judgemental tendencies which helped them manage videos' limitations but could also bias judgements in some circumstances. Students rarely found carefully-placed cameras intrusive and considered filming acceptable if adequately justified. DISCUSSION: Successful use of video-based assessment relies on balancing the need to ensure station-specific information adequacy; avoiding disruptive intrusion; and the degree of justification provided by video's educational purpose. Video has the potential to enhance assessment validity and students' learning when an appropriate balance is achieved.


Assuntos
Competência Clínica , Educação Médica , Avaliação Educacional , Humanos , Julgamento
10.
Med Educ ; 53(9): 941-952, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31264741

RESUMO

CONTEXT: Standard setting is critically important to assessment decisions in medical education. Recent research has demonstrated variations between medical schools in the standards set for shared items. Despite the centrality of judgement to criterion-referenced standard setting methods, little is known about the individual or group processes that underpin them. This study aimed to explore the operation and interaction of these processes in order to illuminate potential sources of variability. METHODS: Using qualitative research, we purposively sampled across UK medical schools that set a low, medium or high standard on nationally shared items, collecting data by observation of graduation-level standard-setting meetings and semi-structured interviews with standard-setting judges. Data were analysed using thematic analysis based on the principles of grounded theory. RESULTS: Standard setting occurred through the complex interaction of institutional context, judges' individual perspectives and group interactions. Schools' procedures, panel members and atmosphere produced unique contexts. Individual judges formed varied understandings of the clinical and technical features of each question, relating these to their differing (sometimes contradictory) conceptions of minimally competent students, by balancing information and making suppositions. Conceptions of minimal competence variously comprised: limited attendance; limited knowledge; poor knowledge application; emotional responses to questions; 'test-savviness', or a strategic focus on safety. Judges experienced tensions trying to situate these abstract conceptions in reality, revealing uncertainty. Groups constructively revised scores through debate, sharing information and often constructing detailed clinical representations of cases. Groups frequently displayed conformity, illustrating a belief that outlying judges were likely to be incorrect. Less frequently, judges resisted change, using emphatic language, bargaining or, rarely, 'polarisation' to influence colleagues. CONCLUSIONS: Despite careful conduct through well-established procedures, standard setting is judgementally complex and involves uncertainty. Understanding whether or how these varied processes produce the previously observed variations in outcomes may offer routes to enhance equivalence of criterion-referenced standards.


Assuntos
Competência Clínica/normas , Educação de Graduação em Medicina , Julgamento , Tomada de Decisões , Avaliação Educacional/métodos , Processos Grupais , Conhecimentos, Atitudes e Prática em Saúde , Humanos , Padrões de Referência , Faculdades de Medicina , Reino Unido
11.
Med Educ ; 53(3): 250-263, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30575092

RESUMO

BACKGROUND: Although averaging across multiple examiners' judgements reduces unwanted overall score variability in objective structured clinical examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national examinations that could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated because fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students' scores. METHODS: We developed video-based examiner score comparison and adjustment (VESCA): volunteer students were filmed 'live' on 10 out of 12 OSCE stations. Following the examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner cohorts. Many-facet Rasch modelling and linear mixed modelling were used to estimate and adjust for examiner-cohort effects on students' scores. RESULTS: After accounting for students' ability, examiner cohorts differed substantially in their stringency or leniency (maximal global score difference of 0.47 out of 7.0 [Cohen's d = 0.96]; maximal total percentage score difference of 5.7% [Cohen's d = 1.06] for the same student ability by different examiner cohorts). Corresponding adjustment of students' global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whereas 8.6-9.5% students' scores were altered by at least 0.5 standard deviations of student ability. CONCLUSIONS: Despite typical reliability, the examiner cohort that students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and adjust for potential systematic differences in scoring patterns that could exist between locations in distributed or national OSCE examinations, thereby ensuring equivalence and fairness.


Assuntos
Competência Clínica/normas , Educação de Graduação em Medicina/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Variações Dependentes do Observador , Gravação de Videoteipe/métodos , Educação de Graduação em Medicina/métodos , Humanos , Reprodutibilidade dos Testes , Estudantes de Medicina
12.
Adv Health Sci Educ Theory Pract ; 23(5): 937-959, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29980956

RESUMO

Recent literature places more emphasis on assessment comments rather than relying solely on scores. Both are variable, however, emanating from assessment judgements. One established source of variability is "contrast effects": scores are shifted away from the depicted level of competence in a preceding encounter. The shift could arise from an effect on the range-frequency of assessors' internal scales or the salience of performance aspects within assessment judgments. As these suggest different potential interventions, we investigated assessors' cognition by using the insight provided by "clusters of consensus" to determine whether any change in the salience of performance aspects was induced by contrast effects. A dataset from a previous experiment contained scores and comments for 3 encounters: 2 with significant contrast effects and 1 without. Clusters of consensus were identified using F-sort and latent partition analysis both when contrast effects were significant and non-significant. The proportion of assessors making similar comments only significantly differed when contrast effects were significant with assessors more frequently commenting on aspects that were dissimilar with the standard of competence demonstrated in the preceding performance. Rather than simply influencing range-frequency of assessors' scales, preceding performances may affect salience of performance aspects through comparative distinctiveness: when juxtaposed with the context some aspects are more distinct and selectively draw attention. Research is needed to determine whether changes in salience indicate biased or improved assessment information. The potential should be explored to augment existing benchmarking procedures in assessor training by cueing assessors' attention through observation of reference performances immediately prior to assessment.


Assuntos
Avaliação Educacional/normas , Ocupações em Saúde/educação , Variações Dependentes do Observador , Competência Clínica , Cognição , Comunicação , Avaliação Educacional/métodos , Humanos , Julgamento , Anamnese , Relações Profissional-Paciente , Método Simples-Cego , Reino Unido
13.
Med Teach ; 40(11): 1159-1165, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-29703091

RESUMO

Background: OSCE examiners' scores are variable and may discriminate domains of performance poorly. Examiners must hold their observations of OSCE performances in "episodic memory" until performances end. We investigated whether examiners vary in their recollection of performances; and whether this relates to their score variability or ability to separate disparate performance domains. Methods: Secondary analysis was performed on data where examiners had: 1/scored videos of OSCE performances showing disparate student ability in different domains; and 2/performed a measure of recollection for an OSCE performance. We calculated measures of "overall-score variance" (the degree individual examiners' overall scores varied from the group mean) and "domain separation" (the degree to which examiners separated different performance domains). We related these variables to the measure of examiners' recollection. Results: Examiners varied considerably in their recollection accuracy (recognition beyond chance -5% to +75% for different examiners). Examiners' recollection accuracy was weakly inversely related to their overall score accuracy (R = -0.17, p < 0.001) and related to their ability to separate domains of performance (R = 0.25, p < 0.001). Conclusions: Examiners vary substantially in their memories for students' performances which may offer a useful point of difference to study processing and integration phases of judgment. Findings could have implication for the utility of feedback.


Assuntos
Avaliação Educacional/normas , Julgamento , Rememoração Mental , Variações Dependentes do Observador , Competência Clínica , Feminino , Humanos , Masculino , Análise de Regressão , Reino Unido
14.
BMC Med ; 15(1): 179, 2017 10 25.
Artigo em Inglês | MEDLINE | ID: mdl-29065875

RESUMO

BACKGROUND: Asian medical students and doctors receive lower scores on average than their white counterparts in examinations in the UK and internationally (a phenomenon known as "differential attainment"). This could be due to examiner bias or to social, psychological or cultural influences on learning or performance. We investigated whether students' scores or feedback show influence of ethnicity-related bias; whether examiners unconsciously bring to mind (activate) stereotypes when judging Asian students' performance; whether activation depends on the stereotypicality of students' performances; and whether stereotypes influence examiner memories of performances. METHODS: This is a randomised, double-blinded, controlled, Internet-based trial. We created near-identical videos of medical student performances on a simulated Objective Structured Clinical Exam using British Asian and white British actors. Examiners were randomly assigned to watch performances from white and Asian students that were either consistent or inconsistent with a previously described stereotype of Asian students' performance. We compared the two examiner groups in terms of the following: the scores and feedback they gave white and Asian students; how much the Asian stereotype was activated in their minds (response times to Asian-stereotypical vs neutral words in a lexical decision task); and whether the stereotype influenced memories of student performances (recognition rates for real vs invented stereotype-consistent vs stereotype-inconsistent phrases from one of the videos). RESULTS: Examiners responded to Asian-stereotypical words (716 ms, 95% confidence interval (CI) 702-731 ms) faster than neutral words (769 ms, 95% CI 753-786 ms, p < 0.001), suggesting Asian stereotypes were activated (or at least active) in examiners' minds. This occurred regardless of whether examiners observed stereotype-consistent or stereotype-inconsistent performances. Despite this stereotype activation, student ethnicity had no influence on examiners' scores; on the feedback examiners gave; or on examiners' memories for one performance. CONCLUSIONS: Examiner bias does not appear to explain the differential attainment of Asian students in UK medical schools. Efforts to ensure equality should focus on social, psychological and cultural factors that may disadvantage learning or performance in Asian and other minority ethnic students.


Assuntos
Competência Clínica , Educação Médica/normas , Povo Asiático , Método Duplo-Cego , Feminino , Humanos , Masculino , Racismo , Estudantes de Medicina , População Branca
15.
Med Teach ; 39(1): 92-99, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27897083

RESUMO

INTRODUCTION: OSCEs are commonly conducted in multiple cycles (different circuits, times, and locations), yet the potential for students' allocation to different OSCE cycles is rarely considered as a source of variance-perhaps in part because conventional psychometrics provide limited insight. METHODS: We used Many Facet Rasch Modeling (MFRM) to estimate the influence of "examiner cohorts" (the combined influence of the examiners in the cycle to which each student was allocated) on students' scores within a fully nested multi-cycle OSCE. RESULTS: Observed average scores for examiners cycles varied by 8.6%, but model-adjusted estimates showed a smaller range of 4.4%. Most students' scores were only slightly altered by the model; the greatest score increase was 5.3%, and greatest score decrease was -3.6%, with 2 students passing who would have failed. DISCUSSION: Despite using 16 examiners per cycle, examiner variability did not completely counter-balance, resulting in an influence of OSCE cycles on students' scores. Assumptions were required for the MFRM analysis; innovative procedures to overcome these limitations and strengthen OSCEs are discussed. CONCLUSIONS: OSCE cycle allocation has the potential to exert a small but unfair influence on students' OSCE scores; these little-considered influences should challenge our assumptions and design of OSCEs.


Assuntos
Educação de Graduação em Medicina/métodos , Educação de Graduação em Medicina/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Competência Clínica , Humanos , Variações Dependentes do Observador , Resolução de Problemas , Reprodutibilidade dos Testes , Fatores de Tempo
16.
Med Educ ; 49(9): 909-19, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26296407

RESUMO

CONTEXT: In prior research, the scores assessors assign can be biased away from the standard of preceding performances (i.e. 'contrast effects' occur). OBJECTIVES: This study examines the mechanism and robustness of these findings to advance understanding of assessor cognition. We test the influence of the immediately preceding performance relative to that of a series of prior performances. Further, we examine whether assessors' narrative comments are similarly influenced by contrast effects. METHODS: Clinicians (n = 61) were randomised to three groups in a blinded, Internet-based experiment. Participants viewed identical videos of good, borderline and poor performances by first-year doctors in varied orders. They provided scores and written feedback after each video. Narrative comments were blindly content-analysed to generate measures of valence and content. Variability of narrative comments and scores was compared between groups. RESULTS: Comparisons indicated contrast effects after a single performance. When a good performance was preceded by a poor performance, ratings were higher (mean 5.01, 95% confidence interval [CI] 4.79-5.24) than when observation of the good performance was unbiased (mean 4.36, 95% CI 4.14-4.60; p < 0.05, d = 1.3). Similarly, borderline performance was rated lower when preceded by good performance (mean 2.96, 95% CI 2.56-3.37) than when viewed without preceding bias (mean 3.55, 95% CI 3.17-3.92; p < 0.05, d = 0.7). The series of ratings participants assigned suggested that the magnitude of contrast effects is determined by an averaging of recent experiences. The valence (but not content) of narrative comments showed contrast effects similar to those found in numerical scores. CONCLUSIONS: These findings are consistent with research from behavioural economics and psychology that suggests judgement tends to be relative in nature. Observing that the valence of narrative comments is similarly influenced suggests these effects represent more than difficulty in translating impressions into a number. The extent to which such factors impact upon assessment in practice remains to be determined as the influence is likely to depend on context.


Assuntos
Avaliação Educacional/normas , Retroalimentação , Narração , Variações Dependentes do Observador , Competência Clínica/normas , Avaliação Educacional/métodos , Humanos , Julgamento , Médicos/psicologia , Reprodutibilidade dos Testes
17.
Med Educ ; 48(11): 1055-68, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25307633

RESUMO

CONTEXT: Performance assessments, such as workplace-based assessments (WBAs), represent a crucial component of assessment strategy in medical education. Persistent concerns about rater variability in performance assessments have resulted in a new field of study focusing on the cognitive processes used by raters, or more inclusively, by assessors. METHODS: An international group of researchers met regularly to share and critique key findings in assessor cognition research. Through iterative discussions, they identified the prevailing approaches to assessor cognition research and noted that each of them were based on nearly disparate theoretical frameworks and literatures. This paper aims to provide a conceptual review of the different perspectives used by researchers in this field using the specific example of WBA. RESULTS: Three distinct, but not mutually exclusive, perspectives on the origins and possible solutions to variability in assessment judgements emerged from the discussions within the group of researchers: (i) the assessor as trainable: assessors vary because they do not apply assessment criteria correctly, use varied frames of reference and make unjustified inferences; (ii) the assessor as fallible: variations arise as a result of fundamental limitations in human cognition that mean assessors are readily and haphazardly influenced by their immediate context, and (iii) the assessor as meaningfully idiosyncratic: experts are capable of making sense of highly complex and nuanced scenarios through inference and contextual sensitivity, which suggests assessor differences may represent legitimate experience-based interpretations. CONCLUSIONS: Although each of the perspectives discussed in this paper advances our understanding of assessor cognition and its impact on WBA, every perspective has its limitations. Following a discussion of areas of concordance and discordance across the perspectives, we propose a coexistent view in which researchers and practitioners utilise aspects of all three perspectives with the goal of advancing assessment quality and ultimately improving patient care.


Assuntos
Educação Médica/normas , Avaliação Educacional , Cognição , Humanos , Pesquisa
18.
Med Educ ; 47(9): 910-22, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23931540

RESUMO

CONTEXT: A recent study has suggested that assessors judge performance comparatively rather than against fixed standards. Ratings assigned to borderline trainees were found to be biased by previously seen candidates' performances. We extended that programme of investigation by examining these effects across a range of performance levels. Furthermore, we investigated whether confidence in the rating assigned predicts susceptibility to manipulation and whether prompting consideration of typical performance lessens the influence of recent experience. METHODS: Consultant doctors were randomised to groups within an internet experiment. The descending performance group judged videos of Foundation Year 1 (F1; postgraduate Year 1) doctors in descending order of proficiency; the ascending performance group judged the same videos in ascending order. For all videos, participants rated: (i) trainee competence; (ii) rater confidence and (iii) percentage better (the percentage of other F1 doctors who would perform better on the same task). RESULTS: Overall, the descending performance group assigned lower scores than the ascending performance group (2.97 [95% confidence interval 2.73-3.20] versus 3.50 [95% confidence interval 3.25-3.74]; F(1,47) = 9.80, p = 0.003, d = 0.52). Pairwise comparisons showed differences were significant for good and borderline performances. The percentage better ratings showed a similar pattern (descending performance mean = 57.4 [95% confidence interval 52.5-62.3], ascending performance mean = 43.4 [95% confidence interval 38.4-48.5]; F(1, 46) = 16.0, p < 0.001, d = 0.67). Confidence ratings did not vary by level of performance and showed no relationship with the effect of group. DISCUSSION: Assessors' judgements showed contrast effects at both good and borderline performance levels. Findings suggest that assessors use normative rather than criterion-referenced decision making while judging, and that the norms referenced are weakly represented in memory and easily influenced. Confidence ratings suggested a lack of insight into this phenomenon. Raters' judgements could be importantly influenced in ways that are unfair to candidates.


Assuntos
Competência Clínica/normas , Educação de Pós-Graduação em Medicina/normas , Avaliação Educacional/normas , Viés , Avaliação Educacional/métodos , Feminino , Humanos , Internet , Masculino , Variações Dependentes do Observador , Médicos/psicologia , Gravação de Videoteipe
19.
Adv Health Sci Educ Theory Pract ; 18(3): 325-41, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22581567

RESUMO

Assessors' scores in performance assessments are known to be highly variable. Attempted improvements through training or rating format have achieved minimal gains. The mechanisms that contribute to variability in assessors' scoring remain unclear. This study investigated these mechanisms. We used a qualitative approach to study assessors' judgements whilst they observed common simulated videoed performances of junior doctors obtaining clinical histories. Assessors commented concurrently and retrospectively on performances, provided scores and follow-up interviews. Data were analysed using principles of grounded theory. We developed three themes that help to explain how variability arises: Differential Salience-assessors paid attention to (or valued) different aspects of the performances to different degrees; Criterion Uncertainty-assessors' criteria were differently constructed, uncertain, and were influenced by recent exemplars; Information Integration-assessors described the valence of their comments in their own unique narrative terms, usually forming global impressions. Our results (whilst not precluding the operation of established biases) describe mechanisms by which assessors' judgements become meaningfully-different or unique. Our results have theoretical relevance to understanding the formative educational messages that performance assessments provide. They give insight relevant to assessor training, assessors' ability to be observationally "objective" and to the educational value of narrative comments (in contrast to numerical ratings).


Assuntos
Competência Clínica/normas , Avaliação Educacional , Avaliação Educacional/métodos , Feminino , Humanos , Julgamento , Masculino , Anamnese/normas , Médicos/normas , Pesquisa Qualitativa , Gravação em Vídeo
20.
JAMA ; 308(21): 2226-32, 2012 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-23212500

RESUMO

CONTEXT: Competency-based models of education require assessments to be based on individuals' capacity to perform, yet the nature of human judgment may fundamentally limit the extent to which such assessment is accurately possible. OBJECTIVE: To determine whether recent observations of the Mini Clinical Evaluation Exercise (Mini-CEX) performance of postgraduate year 1 physicians influence raters' scores of subsequent performances, consistent with either anchoring bias (scores biased similar to previous experience) or contrast bias (scores biased away from previous experience). DESIGN, SETTING, AND PARTICIPANTS: Internet-based randomized, blinded experiment using videos of Mini-CEX assessments of postgraduate year 1 trainees interviewing new internal medicine patients. Participants were 41 attending physicians from England and Wales experienced with the Mini-CEX, with 20 watching and scoring 3 good trainee performances and 21 watching and scoring 3 poor performances. All then watched and scored the same 3 borderline video performances. The study was completed between July and November 2011. MAIN OUTCOME MEASURES: The primary outcome was scores assigned to the borderline videos, using a 6-point Likert scale (anchors included: 1, well below expectations; 3, borderline; 6, well above expectations). Associations were tested in a multivariable analysis that included participants' sex, years of practice, and the stringency index (within-group z score of initial 3 ratings). RESULTS: The mean rating scores assigned by physicians who viewed borderline video performances following exposure to good performances was 2.7 (95% CI, 2.4-3.0) vs 3.4 (95% CI, 3.1-3.7) following exposure to poor performances (difference of 0.67 [95% CI, 0.28-1.07]; P = .001). Borderline videos were categorized as consistent with failing scores in 33 of 60 assessments (55%) in those exposed to good performances and in 15 of 63 assessments (24%) in those exposed to poor performances (P < .001). They were categorized as consistent with passing scores in 5 of 60 assessments (8.3%) in those exposed to good performances compared with 25 of 63 assessments (39.5%) in those exposed to poor performances (P < .001). Sex and years of attending practice were not associated with scores. The priming condition (good vs poor performances) and the stringency index jointly accounted for 45% of the observed variation in raters' scores for the borderline videos (P < .001). CONCLUSION: In an experimental setting, attending physicians exposed to videos of good medical trainee performances rated subsequent borderline performances lower than those who had been exposed to poor performances, consistent with a contrast bias.


Assuntos
Competência Clínica , Educação de Pós-Graduação em Medicina/normas , Avaliação Educacional/normas , Avaliação Educacional/métodos , Inglaterra , Feminino , Humanos , Internet , Masculino , Corpo Clínico Hospitalar , Variações Dependentes do Observador , Método Simples-Cego , Gravação em Vídeo , País de Gales
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA