Pesquisa | Secretaria de Estado da Saúde

1.

An exploration of "real time" assessments as a means to better understand preceptors' judgments of student performance.

Luu, Kimberly; Sidhu, Ravi; Chadha, Neil K; Eva, Kevin W.

Adv Health Sci Educ Theory Pract ; 28(3): 793-809, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-36441287

RESUMO

Clinical supervisors are known to assess trainee performance idiosyncratically, causing concern about the validity of their ratings. The literature on this issue relies heavily on retrospective collection of decisions, resulting in the risk of inaccurate information regarding what actually drives raters' perceptions. Capturing in-the-moment information about supervisors' impressions could yield better insight into how to intervene. The purpose of this study, therefore, was to gather "real-time" judgments to explore what drives preceptors' judgments of student performance. We performed a prospective study in which physicians were asked to adjust a rating scale in real-time while watching two video-recordings of trainee clinical performances. Scores were captured in 1-s increments, examined for frequency, direction, and magnitude of adjustments, and compared to assessors' final entrustability judgment as measured by the modified Ottawa Clinic Assessment Tool. The standard deviation in raters' judgment was examined as a function of time to determine how long it takes impressions to begin to vary. 20 participants viewed 2 clinical vignettes. Considerable variability in ratings was observed with different behaviours triggering scale adjustments for different raters. That idiosyncrasy occurred very quickly, with the standard deviation in raters' judgments rapidly increasing within 30 s of case onset. Particular moments appeared to generally be influential, but their degree of influence still varied. Correlations between the final assessment and (a) score assigned upon first adjustment of the scale, (b) upon last adjustment, and (c) the mean score, were r = 0.13, 0.32, and 0.57 for one video and r = 0.30, 0.50, and 0.52 for the other, indicating the degree to which overall impressions reflected accumulation of raters' idiosyncratic moment-by-moment observations. Our results demonstrated that variability in raters' impressions begins very early in a case presentation and is associated with different behaviours having different influence on different raters. More generally, this study outlines a novel methodology that offers a new path for gaining insight into factors influencing assessor judgments.

Assuntos

Competência Clínica , Julgamento , Humanos , Estudos Prospectivos , Estudos Retrospectivos , Avaliação Educacional/métodos

2.

Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes?

Wood, Timothy J; Daniels, Vijay J; Pugh, Debra; Touchie, Claire; Halman, Samantha; Humphrey-Murto, Susan.

Adv Health Sci Educ Theory Pract ; 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-38010576

RESUMO

First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to assess if first impressions affect raters' judgments when workplace performance changes. Second, whether explicitly stating these impressions affects subsequent ratings compared to implicitly-formed first impressions. Physician raters viewed six videos where learner performance either changed (Strong to Weak or Weak to Strong) or remained consistent. Raters were assigned two groups. Group one (n = 23, Explicit) made a first impression global rating (FIGR), then scored learners using the Mini-CEX. Group two (n = 22, Implicit) scored learners at the end of the video solely with the Mini-CEX. For the Explicit group, in the Strong to Weak condition, the FIGR (M = 5.94) was higher than the Mini-CEX Global rating (GR) (M = 3.02, p < .001). In the Weak to Strong condition, the FIGR (M = 2.44) was lower than the Mini-CEX GR (M = 3.96 p < .001). There was no difference between the FIGR and the Mini-CEX GR in the consistent condition (M = 6.61, M = 6.65 respectively, p = .84). There were no statistically significant differences in any of the conditions when comparing both groups' Mini-CEX GR. Therefore, raters adjusted their judgments based on the learners' performances. Furthermore, raters who made their first impressions explicit showed similar rater bias to raters who followed a more naturalistic process.

3.

Exploring the relationships between first impressions and MMI ratings: a pilot study.

Klusmann, Dietrich; Knorr, Mirjana; Hampe, Wolfgang.

Adv Health Sci Educ Theory Pract ; 28(2): 519-536, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36053344

RESUMO

The phenomenon of first impression is well researched in social psychology, but less so in the study of OSCEs and the multiple mini interview (MMI). To explore its bearing on the MMI method we included a rating of first impression in the MMI for student selection executed 2012 at the University Medical Center Hamburg-Eppendorf, Germany (196 applicants, 26 pairs of raters) and analyzed how it was related to MMI performance ratings made by (a) the same rater, and (b) a different rater. First impression was assessed immediately after an applicant entered the test room. Each MMI-task took 5 min and was rated subsequently. Internal consistency was α = .71 for first impression and α = .69 for MMI performance. First impression and MMI performance correlated by r = .49. Both measures weakly predicted performance in two OSCEs for communication skills, assessed 18 months later. MMI performance did not increment prediction above the contribution of first impression and vice versa. Prediction was independent of whether or not the rater who rated first impression also rated MMI performance. The correlation between first impression and MMI-performance is in line with the results of corresponding social psychological studies, showing that judgements based on minimal information moderately predict behavioral measures. It is also in accordance with the notion that raters often blend their specific assessment task outlined in MMI-instructions with the self-imposed question of whether a candidate would fit the role of a medical doctor.

Assuntos

Comunicação , Critérios de Admissão Escolar , Humanos , Projetos Piloto , Faculdades de Medicina , Centros Médicos Acadêmicos

4.

Human ratings take time: A hierarchical facets model for the joint analysis of ratings and rating times.

Jin, Kuan-Yu; Eckes, Thomas.

Behav Res Methods ; 2023 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-37919615

RESUMO

Performance assessments increasingly utilize onscreen or internet-based technology to collect human ratings. One of the benefits of onscreen ratings is the automatic recording of rating times along with the ratings. Considering rating times as an additional data source can provide a more detailed picture of the rating process and improve the psychometric quality of the assessment outcomes. However, currently available models for analyzing performance assessments do not incorporate rating times. The present research aims to fill this gap and advance a joint modeling approach, the "hierarchical facets model for ratings and rating times" (HFM-RT). The model includes two examinee parameters (ability and time intensity) and three rater parameters (severity, centrality, and speed). The HFM-RT successfully recovered examinee and rater parameters in a simulation study and yielded superior reliability indices. A real-data analysis of English essay ratings collected in a high-stakes assessment context revealed that raters exhibited considerably different speed measures, spent more time on high-quality than low-quality essays, and tended to rate essays faster with increasing severity. However, due to the significant heterogeneity of examinees' writing proficiency, the improvement in the assessment's reliability using the HFM-RT was not salient in the real-data example. This discussion focuses on the advantages of accounting for rating times as a source of information in rating quality studies and highlights perspectives from the HFM-RT for future research on rater cognition.

5.

Pre-clerkship EPA assessments: a thematic analysis of rater cognition.

Meyer, Eric G; Harvey, Emily; Durning, Steven J; Uijtdehaage, Sebastian.

BMC Med Educ ; 22(1): 347, 2022 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-35524304

RESUMO

BACKGROUND: Entrustable Professional Activities (EPAs) assessments measure learners' competence with an entrustment or supervisory scale. Designed for workplace-based assessment EPA assessments have also been proposed for undergraduate medical education (UME), where assessments frequently occur outside the workplace and may be less intuitive, raising validity concerns. This study explored how assessors make entrustment determinations in UME, with additional specific comparison based on familiarity with prior performance in the context of longitudinal student-assessor relationships. METHODS: A qualitative approach using think-alouds was employed. Assessors assessed two students (familiar and unfamiliar) completing a history and physical examination using a supervisory scale and then thought-aloud after each assessment. We conducted a thematic analysis of assessors' response processes and compared them based on their familiarity with a student. RESULTS: Four themes and fifteen subthemes were identified. The most prevalent theme related to "student performance." The other three themes included "frame of reference," "assessor uncertainty," and "the patient." "Previous student performance" and "affective reactions" were subthemes more likely to inform scoring when faculty were familiar with a student, while unfamiliar faculty were more likely to reference "self" and "lack confidence in their ability to assess." CONCLUSIONS: Student performance appears to be assessors' main consideration for all students, providing some validity evidence for the response process in EPA assessments. Several problematic themes could be addressed with faculty development while others appear to be inherent to entrustment and may be more challenging to mitigate. Differences based on assessor familiarity with student merits further research on how trust develops over time.

Assuntos

Educação Baseada em Competências , Educação de Graduação em Medicina , Competência Clínica , Cognição , Docentes , Humanos

6.

Are raters influenced by prior information about a learner? A review of assimilation and contrast effects in assessment.

Humphrey-Murto, Susan; Shaw, Tammy; Touchie, Claire; Pugh, Debra; Cowley, Lindsay; Wood, Timothy J.

Adv Health Sci Educ Theory Pract ; 26(3): 1133-1156, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-33566199

RESUMO

Understanding which factors can impact rater judgments in assessments is important to ensure quality ratings. One such factor is whether prior performance information (PPI) about learners influences subsequent decision making. The information can be acquired directly, when the rater sees the same learner, or different learners over multiple performances, or indirectly, when the rater is provided with external information about the same learner prior to rating a performance (i.e., learner handover). The purpose of this narrative review was to summarize and highlight key concepts from multiple disciplines regarding the influence of PPI on subsequent ratings, discuss implications for assessment and provide a common conceptualization to inform research. Key findings include (a) assimilation (rater judgments are biased towards the PPI) occurs with indirect PPI and contrast (rater judgments are biased away from the PPI) with direct PPI; (b) negative PPI appears to have a greater effect than positive PPI; (c) when viewing multiple performances, context effects of indirect PPI appear to diminish over time; and (d) context effects may occur with any level of target performance. Furthermore, some raters are not susceptible to context effects, but it is unclear what factors are predictive. Rater expertise and training do not consistently reduce effects. Making raters more accountable, providing specific standards and reducing rater cognitive load may reduce context effects. Theoretical explanations for these findings will be discussed.

Assuntos

Competência Clínica , Avaliação Educacional , Humanos , Julgamento , Variações Dependentes do Observador , Pesquisadores

7.

Clinical assessors' working conceptualisations of undergraduate consultation skills: a framework analysis of how assessors make expert judgements in practice.

Hyde, Catherine; Yardley, Sarah; Lefroy, Janet; Gay, Simon; McKinley, Robert K.

Adv Health Sci Educ Theory Pract ; 25(4): 845-875, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-31997115

RESUMO

Undergraduate clinical assessors make expert, multifaceted judgements of consultation skills in concert with medical school OSCE grading rubrics. Assessors are not cognitive machines: their judgements are made in the light of prior experience and social interactions with students. It is important to understand assessors' working conceptualisations of consultation skills and whether they could be used to develop assessment tools for undergraduate assessment. To identify any working conceptualisations that assessors use while assessing undergraduate medical students' consultation skills and develop assessment tools based on assessors' working conceptualisations and natural language for undergraduate consultation skills. In semi-structured interviews, 12 experienced assessors from a UK medical school populated a blank assessment scale with personally meaningful descriptors while describing how they made judgements of students' consultation skills (at exit standard). A two-step iterative thematic framework analysis was performed drawing on constructionism and interactionism. Five domains were found within working conceptualisations of consultation skills: Application of knowledge; Manner with patients; Getting it done; Safety; and Overall impression. Three mechanisms of judgement about student behaviour were identified: observations, inferences and feelings. Assessment tools drawing on participants' conceptualisations and natural language were generated, including 'grade descriptors' for common conceptualisations in each domain by mechanism of judgement and matched to grading rubrics of Fail, Borderline, Pass, Very good. Utilising working conceptualisations to develop assessment tools is feasible and potentially useful. Work is needed to test impact on assessment quality.

Assuntos

Educação de Graduação em Medicina/organização & administração , Avaliação Educacional/normas , Julgamento , Comportamento , Competência Clínica , Educação de Graduação em Medicina/normas , Humanos , Entrevistas como Assunto , Conhecimento , Segurança do Paciente , Relações Médico-Paciente , Pesquisa Qualitativa

8.

From opening the 'black box' to looking behind the curtain: cognition and context in assessor-based judgements.

Lee, Victor; Brain, Keira; Martin, Jenepher.

Adv Health Sci Educ Theory Pract ; 24(1): 85-102, 2019 03.

Artigo em Inglês | MEDLINE | ID: mdl-30302670

RESUMO

The increasing use of direct observation tools to assess routine performance has resulted in the growing reliance on assessor-based judgements in the workplace. However, we have a limited understanding of how assessors make judgements and formulate ratings in real world contexts. The current research on assessor cognition has largely focused on the cognitive domain but the contextual factors are equally important, and both are closely interconnected. This study aimed to explore the perceived cognitive and contextual factors influencing Mini-CEX assessor judgements in the Emergency Department setting. We used a conceptual framework of assessor-based judgement to develop a sequential mixed methods study. We analysed and integrated survey and focus group results to illustrate self-reported cognitive and contextual factors influencing assessor judgements. We used situated cognition theory as a sensitizing lens to explore the interactions between people and their environment. The major factors highlighted through our mixed methods study were: clarity of the assessment, reliance on and variable approach to overall impression (gestalt), role tension especially when giving constructive feedback, prior knowledge of the trainee and case complexity. We identified prevailing tensions between participants (assessors and trainees), interactions (assessment and feedback) and setting. The two practical implications of our research are the need to broaden assessor training to incorporate both cognitive and contextual domains, and the need to develop a more holistic understanding of assessor-based judgements in real world contexts to better inform future research and development in workplace-based assessments.

Assuntos

Competência Clínica , Cognição , Educação de Pós-Graduação em Medicina/métodos , Avaliação Educacional/métodos , Julgamento , Adulto , Comunicação , Educação Baseada em Competências/métodos , Educação de Pós-Graduação em Medicina/normas , Avaliação Educacional/normas , Serviço Hospitalar de Emergência/organização & administração , Feminino , Feedback Formativo , Humanos , Masculino , Anamnese/normas , Pessoa de Meia-Idade , Exame Físico/normas , Profissionalismo/normas , Teoria Psicológica , Pesquisa Qualitativa , Fatores de Tempo

9.

Interprofessional assessment of medical students' competences with an instrument suitable for physicians and nurses.

Prediger, Sarah; Fürstenberg, Sophie; Berberat, Pascal O; Kadmon, Martina; Harendza, Sigrid.

BMC Med Educ ; 19(1): 46, 2019 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-30728006

RESUMO

BACKGROUND: Physicians need a set of specific competences to perform well in interprofessional teams in their first year of residency. These competences should be achieved with graduation from medical school. Assessments during undergraduate medical studies are mostly rated by supervisors only. The aim of our study was to compare the rating of core facets of competence of medical students late in their undergraduate training as well as the rating confidence between three different groups of assessors (supervisors, residents, and nurses) in an assessment simulating the first day of residency. METHODS: Sixty-seven advanced medical students from three different medical schools (Hamburg, Oldenburg and Munich) participated in a 360-degree assessment simulating the first working day of a resident. Each participant was rated by three assessors - a supervisor, a resident and a nurse - in seven facets of competence relevant for the first year of residency: (1) responsibility, (2) teamwork and collegiality, (3) knowing and maintaining own personal bounds and possibilities, (4) structure, work planning and priorities, (5) coping with mistakes, (6) scientifically and empirically grounded method of working, and (7) verbal communication with colleagues and supervisors. Means of all assessed competences and confidences of judgement of the three rating groups were compared. Additionally, correlations between assessed competences and confidence of judgement within each group of raters were computed. RESULTS: All rating groups showed consistent assessment decisions (Cronbach's α: supervisors = .90, residents = .80, nurses = .78). Nurses assessed the participants significantly higher in all competences compared to supervisors and residents (all p ≤ .05) with moderate and high effect sizes (d = .667-1.068). While supervisors' and residents' ratings were highest for "teamwork and collegiality", participants received the highest rating by nurses for "responsibility". Competences assessed by nurses were strongly positively correlated with their confidence of judgment while supervisors' assessments correlated only moderately with their confidence of judgment in two competences. CONCLUSIONS: Different professional perspectives provide differentiated competence ratings for medical students in the role of a beginning resident. Rating confidence should be enhanced by empirically derived behavior checklists with anchors, which need to be included in rater training to decrease raters' subjectivity.

Assuntos

Competência Clínica/normas , Internato e Residência , Estudantes de Medicina , Desempenho Acadêmico , Atitude do Pessoal de Saúde , Lista de Checagem , Comportamento Cooperativo , Humanos , Relações Interprofissionais , Faculdades de Medicina

10.

Can physician examiners overcome their first impression when examinee performance changes?

Wood, Timothy J; Pugh, Debra; Touchie, Claire; Chan, James; Humphrey-Murto, Susan.

Adv Health Sci Educ Theory Pract ; 23(4): 721-732, 2018 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-29556923

RESUMO

There is an increasing focus on factors that influence the variability of rater-based judgments. First impressions are one such factor. First impressions are judgments about people that are made quickly and are based on little information. Under some circumstances, these judgments can be predictive of subsequent decisions. A concern for both examinees and test administrators is whether the relationship remains stable when the performance of the examinee changes. That is, once a first impression is formed, to what degree will an examiner be willing to modify it? The purpose of this study is to determine the degree that first impressions influence final ratings when the performance of examinees changes within the context of an objective structured clinical examination (OSCE). Physician examiners (n = 29) viewed seven videos of examinees (i.e., actors) performing a physical exam on a single OSCE station. They rated the examinees' clinical abilities on a six-point global rating scale after 60 s (first impression or FIGR). They then observed the examinee for the remainder of the station and provided a final global rating (GRS). For three of the videos, the examinees' performance remained consistent throughout the videos. For two videos, examinee performance changed from initially strong to weak and for two videos, performance changed from initially weak to strong. The mean FIGR rating for the Consistent condition (M = 4.80) and the Strong to Weak condition (M = 4.87) were higher compared to their respective GRS ratings (M = 3.93, M = 2.73) with a greater decline for the Strong to Weak condition. The mean FIGR rating for the Weak to Strong condition was lower (3.60) than the corresponding mean GRS (4.81). This pattern of findings suggests that raters were willing to change their judgments based on examinee performance. Future work should explore the impact of making a first impression judgment explicit versus implicit and the role of context on the relationship between a first impression and a subsequent judgment.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Variações Dependentes do Observador , Adulto , Feminino , Humanos , Julgamento , Masculino , Pessoa de Meia-Idade , Fatores Socioeconômicos

11.

From aggregation to interpretation: how assessors judge complex data in a competency-based portfolio.

Oudkerk Pool, Andrea; Govaerts, Marjan J B; Jaarsma, Debbie A D C; Driessen, Erik W.

Adv Health Sci Educ Theory Pract ; 23(2): 275-287, 2018 May.

Artigo em Inglês | MEDLINE | ID: mdl-29032415

RESUMO

While portfolios are increasingly used to assess competence, the validity of such portfolio-based assessments has hitherto remained unconfirmed. The purpose of the present research is therefore to further our understanding of how assessors form judgments when interpreting the complex data included in a competency-based portfolio. Eighteen assessors appraised one of three competency-based mock portfolios while thinking aloud, before taking part in semi-structured interviews. A thematic analysis of the think-aloud protocols and interviews revealed that assessors reached judgments through a 3-phase cyclical cognitive process of acquiring, organizing, and integrating evidence. Upon conclusion of the first cycle, assessors reviewed the remaining portfolio evidence to look for confirming or disconfirming evidence. Assessors were inclined to stick to their initial judgments even when confronted with seemingly disconfirming evidence. Although assessors reached similar final (pass-fail) judgments of students' professional competence, they differed in their information-processing approaches and the reasoning behind their judgments. Differences sprung from assessors' divergent assessment beliefs, performance theories, and inferences about the student. Assessment beliefs refer to assessors' opinions about what kind of evidence gives the most valuable and trustworthy information about the student's competence, whereas assessors' performance theories concern their conceptualizations of what constitutes professional competence and competent performance. Even when using the same pieces of information, assessors furthermore differed with respect to inferences about the student as a person as well as a (future) professional. Our findings support the notion that assessors' reasoning in judgment and decision-making varies and is guided by their mental models of performance assessment, potentially impacting feedback and the credibility of decisions. Our findings also lend further credence to the assertion that portfolios should be judged by multiple assessors who should, moreover, thoroughly substantiate their judgments. Finally, it is suggested that portfolios be designed in such a way that they facilitate the selection of and navigation through the portfolio evidence.

Assuntos

Competência Clínica/normas , Tomada de Decisões , Educação de Graduação em Medicina/métodos , Avaliação Educacional/métodos , Educação Baseada em Competências/métodos , Educação Baseada em Competências/normas , Educação de Graduação em Medicina/normas , Humanos , Entrevistas como Assunto , Países Baixos , Variações Dependentes do Observador

12.

Comparatively salient: examining the influence of preceding performances on assessors' focus and interpretations in written assessment comments.

Gingerich, Andrea; Schokking, Edward; Yeates, Peter.

Adv Health Sci Educ Theory Pract ; 23(5): 937-959, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-29980956

RESUMO

Recent literature places more emphasis on assessment comments rather than relying solely on scores. Both are variable, however, emanating from assessment judgements. One established source of variability is "contrast effects": scores are shifted away from the depicted level of competence in a preceding encounter. The shift could arise from an effect on the range-frequency of assessors' internal scales or the salience of performance aspects within assessment judgments. As these suggest different potential interventions, we investigated assessors' cognition by using the insight provided by "clusters of consensus" to determine whether any change in the salience of performance aspects was induced by contrast effects. A dataset from a previous experiment contained scores and comments for 3 encounters: 2 with significant contrast effects and 1 without. Clusters of consensus were identified using F-sort and latent partition analysis both when contrast effects were significant and non-significant. The proportion of assessors making similar comments only significantly differed when contrast effects were significant with assessors more frequently commenting on aspects that were dissimilar with the standard of competence demonstrated in the preceding performance. Rather than simply influencing range-frequency of assessors' scales, preceding performances may affect salience of performance aspects through comparative distinctiveness: when juxtaposed with the context some aspects are more distinct and selectively draw attention. Research is needed to determine whether changes in salience indicate biased or improved assessment information. The potential should be explored to augment existing benchmarking procedures in assessor training by cueing assessors' attention through observation of reference performances immediately prior to assessment.

Assuntos

Avaliação Educacional/normas , Ocupações em Saúde/educação , Variações Dependentes do Observador , Competência Clínica , Cognição , Comunicação , Avaliação Educacional/métodos , Humanos , Julgamento , Anamnese , Relações Profissional-Paciente , Método Simples-Cego , Reino Unido

13.

The influence of first impressions on subsequent ratings within an OSCE station.

Wood, Timothy J; Chan, James; Humphrey-Murto, Susan; Pugh, Debra; Touchie, Claire.

Adv Health Sci Educ Theory Pract ; 22(4): 969-983, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27848171

RESUMO

Competency-based assessment is placing increasing emphasis on the direct observation of learners. For this process to produce valid results, it is important that raters provide quality judgments that are accurate. Unfortunately, the quality of these judgments is variable and the roles of factors that influence the accuracy of those judgments are not clearly understood. One such factor is first impressions: that is, judgments about people we do not know, made quickly and based on very little information. This study explores the influence of first impressions in an OSCE. Specifically, the purpose is to begin to examine the accuracy of a first impression and its influence on subsequent ratings. We created six videotapes of history-taking performance. Each video was scripted from a real performance by six examinee residents within a single OSCE station. Each performance was re-enacted with six different actors playing the role of the examinees and one actor playing the role of the patient and videotaped. A total of 23 raters (i.e., physician examiners) reviewed each video and were asked to make a global judgment of the examinee's clinical abilities after 60 s (First Impression GR) by providing a rating on a six-point global rating scale and then to rate their confidence in the accuracy of that judgment by providing a rating on a five-point rating scale (Confidence GR). After making these ratings, raters then watched the remainder of the examinee's performance and made another global rating of performance (Final GR) before moving on to the next video. First impression ratings of ability varied across examinees and were moderately correlated to expert ratings (r = .59, 95% CI [-.13, .90]). There were significant differences in mean ratings for three examinees. Correlations ranged from .05 to .56 but were only significant for three examinees. Rater confidence in their first impression was not related to the likelihood of a rater changing their rating between the first impression and a subsequent rating. The findings suggest that first impressions could play a role in explaining variability in judgments, but their importance was determined by the videotaped performance of the examinees. More work is needed to clarify conditions that support or discourage the use of first impressions.

Assuntos

Educação Médica/métodos , Avaliação Educacional/métodos , Avaliação Educacional/normas , Docentes de Medicina/psicologia , Competência Clínica/normas , Educação Médica/normas , Docentes de Medicina/normas , Humanos , Anamnese/normas , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Gravação de Videoteipe

14.

Inter-rater variability as mutual disagreement: identifying raters' divergent points of view.

Gingerich, Andrea; Ramlo, Susan E; van der Vleuten, Cees P M; Eva, Kevin W; Regehr, Glenn.

Adv Health Sci Educ Theory Pract ; 22(4): 819-838, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27651046

RESUMO

Whenever multiple observers provide ratings, even of the same performance, inter-rater variation is prevalent. The resulting 'idiosyncratic rater variance' is considered to be unusable error of measurement in psychometric models and is a threat to the defensibility of our assessments. Prior studies of inter-rater variation in clinical assessments have used open response formats to gather raters' comments and justifications. This design choice allows participants to use idiosyncratic response styles that could result in a distorted representation of the underlying rater cognition and skew subsequent analyses. In this study we explored rater variability using the structured response format of Q methodology. Physician raters viewed video-recorded clinical performances and provided Mini Clinical Evaluation Exercise (Mini-CEX) assessment ratings through a web-based system. They then shared their assessment impressions by sorting statements that described the most salient aspects of the clinical performance onto a forced quasi-normal distribution ranging from "most consistent with my impression" to "most contrary to my impression". Analysis of the resulting Q-sorts revealed distinct points of view for each performance shared by multiple physicians. The points of view corresponded with the ratings physicians assigned to the performance. Each point of view emphasized different aspects of the performance with either rapport-building and/or medical expertise skills being most salient. It was rare for the points of view to diverge based on disagreements regarding the interpretation of a specific aspect of the performance. As a result, physicians' divergent points of view on a given clinical performance cannot be easily reconciled into a single coherent assessment judgment that is impacted by measurement error. If inter-rater variability does not wholly reflect error of measurement, it is problematic for our current measurement models and poses challenges for how we are to adequately analyze performance assessment ratings.

Assuntos

Educação Médica/métodos , Educação Médica/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Competência Clínica , Feminino , Humanos , Masculino , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Gravação em Vídeo

15.

Expectations, observations, and the cognitive processes that bind them: expert assessment of examinee performance.

St-Onge, Christina; Chamberland, Martine; Lévesque, Annie; Varpio, Lara.

Adv Health Sci Educ Theory Pract ; 21(3): 627-42, 2016 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26620923

RESUMO

Performance-based assessment (PBA) is a valued assessment approach in medical education, be it in a clerkship, residency, or practice context. Raters are intrinsic to PBA and the increased use of PBA has lead to an increased interest in rater cognition. Although several researchers have tackled factors that may influence the variability in rater judgment, the critical examination of rater observation of performance and the translation of that data into judgements are being investigated. The purpose of this study was to qualitatively investigate the cognitive processes of raters, and to create a framework that conceptualizes those processes when raters assess a complex performance. We conducted semi-structured interviews with 11 faculty members (nominated as excellent assessors) from a Department of Medicine to investigate how raters observe, interpret, and translate performance into judgments. The transcribed verbal protocols were analyzed using Constructivist Grounded Theory in order to develop a theoretical model of raters' assessment processes. Several themes emerged from the data and were grouped according to three macro-level themes describing how the raters balance two sources of data [(1) external sources of information and (2) internal/personal sources of information] by relying on specific cognitive processes to assess an examinee performance. The results from our study demonstrate that assessment is a difficult cognitive task that involves nuance using specific cognitive processes to weigh external and internal data against each other. Our data clearly draws attention to the constant struggle between objectivity and subjectivity that is observed in assessment as illustrated by the importance given to nuancing the examinee's observed performance.

Assuntos

Cognição , Avaliação Educacional , Competência Clínica/normas , Educação Médica/normas , Avaliação Educacional/métodos , Docentes de Medicina/psicologia , Feminino , Humanos , Entrevistas como Assunto , Julgamento , Masculino

16.

In the minds of OSCE examiners: uncovering hidden assumptions.

Chahine, Saad; Holmes, Bruce; Kowalewski, Zbigniew.

Adv Health Sci Educ Theory Pract ; 21(3): 609-25, 2016 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26661783

RESUMO

The Objective Structured Clinical Exam (OSCE) is a widely used method of assessment in medical education. Rater cognition has become an important area of inquiry in the medical education assessment literature generally, and in the OSCE literature specifically, because of concerns about potential compromises of validity. In this study, a novel approach to mixed methods that combined Ordinal Logistic Hierarchical Linear Modeling and cognitive interviews was used to gain insights about what examiners were thinking during an OSCE. This study is based on data from the 2010 to 2014 administrations of the Clinician Assessment for Practice Program OSCE for International Medical Graduates (IMGs) in Nova Scotia. An IMG is a physician trained outside of Canada who was a licensed practitioner in a different country. The quantitative data were examined alongside four follow-up cognitive interviews of examiners conducted after the 2014 administration. The quantitative results show that competencies of (1) Investigation and Management and (2) Counseling were highly predictive of the Overall Global score. These competencies were also described in the cognitive interviews as the most salient parts of OSCE. Examiners also found Communication Skills and Professional Behavior to be relevant but the quantitative results revealed these to be less predictive of the Overall Global score. The interviews also reveal that there is a tacit sequence by which IMGs are expected to proceed in an OSCE, starting with more basic competencies such as History Taking and building up to Investigation Management and Counseling. The combined results confirm that a hidden pattern exists with respect to how examiners rate candidates. This study has potential implications for research into rater cognition, and the design and scoring of practice-ready OSCEs.

Assuntos

Educação Médica/normas , Avaliação Educacional/métodos , Competência Clínica/normas , Avaliação Educacional/normas , Humanos , Entrevistas como Assunto , Modelos Lineares , Modelos Logísticos , Nova Escócia , Reprodutibilidade dos Testes

17.

Selecting and Simplifying: Rater Performance and Behavior When Considering Multiple Competencies.

Tavares, Walter; Ginsburg, Shiphra; Eva, Kevin W.

Teach Learn Med ; 28(1): 41-51, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26787084

RESUMO

THEORY: Assessment of clinical competence is a complex cognitive task with many mental demands often imposed on raters unintentionally. We were interested in whether this burden might contribute to well-described limitations in assessment judgments. In this study we examine the effect on indicators of rating quality of asking raters to (a) consider multiple competencies and (b) attend to multiple issues. In addition, we explored the cognitive strategies raters engage when asked to consider multiple competencies simultaneously. HYPOTHESES: We hypothesized that indications of rating quality (e.g., interrater reliability) would decline as the number of dimensions raters are expected to consider increases. METHOD: Experienced faculty examiners rated prerecorded clinical performances within a 2 (number of dimensions) × 2 (presence of distracting task) × 3 (number of videos) factorial design. Half of the participants were asked to rate 7 dimensions of performance (7D), and half were asked to rate only 2 (2D). The second factor involved the requirement (or lack thereof) to rate the performance of actors participating in the simulation. We calculated the interrater reliability of the scores assigned and counted the number of relevant behaviors participants identified as informing their ratings. Second, we analyzed data from semistructured posttask interviews to explore the rater strategies associated with rating under conditions designed to broaden raters' focus. RESULTS: Generalizability analyses revealed that the 2D group achieved higher interrater reliability relative to the 7D group (G = .56 and .42, respectively, when the average of 10 raters is calculated). The requirement to complete an additional rating task did not have an effect. Using the 2 dimensions common to both groups, an analysis of variance revealed that participants who were asked to rate only 2 dimensions identified more behaviors of relevance to the focal dimensions than those asked to rate 7 dimensions: procedural skill = 36.2%, 95% confidence interval (CI) [32.5, 40.0] versus 23.5%, 95% CI [20.8, 26.3], respectively; history gathering = 38.6%, 95% CI [33.5, 42.9] versus 24.0%, 95% CI [21.1, 26.9], respectively; ps < .05. During posttask interviews, raters identified many sources of cognitive load and idiosyncratic cognitive strategies used to reduce cognitive load during the rating task. CONCLUSIONS: As intrinsic rating demands increase, indicators of rating quality decline. The strategies that raters engage when asked to rate many dimensions simultaneously are varied and appear to yield idiosyncratic efforts to reduce cognitive effort, which may affect the degree to which raters make judgments based on comparable information.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Adulto , Feminino , Humanos , Entrevistas como Assunto , Masculino , Pessoa de Meia-Idade , Nova Escócia , Ontário , Gravação de Videoteipe

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa