Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

The Ottawa resident observation form for nurses (O-RON): evaluation of an assessment tool's psychometric properties in different specialties.

Chiu, Hedva; Wood, Timothy J; Garber, Adam; Halman, Samantha; Rekman, Janelle; Gofton, Wade; Dudek, Nancy.

BMC Med Educ ; 24(1): 487, 2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38698352

RESUMO

BACKGROUND: Workplace-based assessment (WBA) used in post-graduate medical education relies on physician supervisors' feedback. However, in a training environment where supervisors are unavailable to assess certain aspects of a resident's performance, nurses are well-positioned to do so. The Ottawa Resident Observation Form for Nurses (O-RON) was developed to capture nurses' assessment of trainee performance and results have demonstrated strong evidence for validity in Orthopedic Surgery. However, different clinical settings may impact a tool's performance. This project studied the use of the O-RON in three different specialties at the University of Ottawa. METHODS: O-RON forms were distributed on Internal Medicine, General Surgery, and Obstetrical wards at the University of Ottawa over nine months. Validity evidence related to quantitative data was collected. Exit interviews with nurse managers were performed and content was thematically analyzed. RESULTS: 179 O-RONs were completed on 30 residents. With four forms per resident, the ORON's reliability was 0.82. Global judgement response and frequency of concerns was correlated (r = 0.627, P < 0.001). CONCLUSIONS: Consistent with the original study, the findings demonstrated strong evidence for validity. However, the number of forms collected was less than expected. Exit interviews identified factors impacting form completion, which included clinical workloads and interprofessional dynamics.

Assuntos

Competência Clínica , Internato e Residência , Psicometria , Humanos , Reprodutibilidade dos Testes , Feminino , Masculino , Avaliação Educacional/métodos , Ontário , Medicina Interna/educação

2.

Assessing the utility of a novel entrustment-supervision assessment tool.

Dewhirst, Sebastian; Wood, Timothy J; Cheung, Warren J; Frank, Jason R.

Med Educ ; 57(10): 949-957, 2023 10.

Artigo em Inglês | MEDLINE | ID: mdl-37387266

RESUMO

BACKGROUND: Work-based assessments (WBAs) are increasingly used to inform decisions about trainee progression. Unfortunately, WBAs often fail to discriminate between trainees of differing abilities and have poor reliability. Entrustment-supervision scales may improve WBA performance, but there is a paucity of literature directly comparing them to traditional WBA tools. METHODS: The Ottawa Emergency Department Shift Observation Tool (O-EDShOT) is a previously published WBA tool employing an entrustment-supervision scale with strong validity evidence. This pre-/post-implementation study compares the performance of the O-EDShOT with that of a traditional WBA tool using norm-based anchors. All assessments completed in 12-month periods before and after implementing the O-EDShOT were collected, and generalisability analysis was conducted with year of training, trainees within year and forms within trainee as nested factors. Secondary analysis included assessor as a factor. RESULTS: A total of 3908 and 3679 assessments were completed by 99 and 116 assessors, for 152 and 138 trainees in the pre- and post-implementation phases respectively. The O-EDShOT generated a wider range of awarded scores than the traditional WBA, and mean scores increased more with increasing level of training (0.32 vs. 0.14 points per year, p = 0.01). A significantly greater proportion of overall score variability was attributable to trainees using the O-EDShOT (59%) compared with the traditional tool (21%, p < 0.001). Assessors contributed less to overall score variability for the O-EDShOT than for the traditional WBA (16% vs. 37%). Moreover, the O-EDShOT required fewer completed assessments than the traditional tool (27 vs. 51) for a reliability of 0.8. CONCLUSION: The O-EDShOT outperformed a traditional norm-referenced WBA in discriminating between trainees and required fewer assessments to generate a reliable estimate of trainee performance. More broadly, this study adds to the body of literature suggesting that entrustment-supervision scales generate more useful and reliable assessments in a variety of clinical settings.

Assuntos

Avaliação Educacional , Local de Trabalho , Humanos , Reprodutibilidade dos Testes , Competência Clínica , Educação de Pós-Graduação em Medicina

3.

Implicit versus explicit first impressions in performance-based assessment: will raters overcome their first impressions when learner performance changes?

Wood, Timothy J; Daniels, Vijay J; Pugh, Debra; Touchie, Claire; Halman, Samantha; Humphrey-Murto, Susan.

Adv Health Sci Educ Theory Pract ; 2023 Nov 27.

Artigo em Inglês | MEDLINE | ID: mdl-38010576

RESUMO

First impressions can influence rater-based judgments but their contribution to rater bias is unclear. Research suggests raters can overcome first impressions in experimental exam contexts with explicit first impressions, but these findings may not generalize to a workplace context with implicit first impressions. The study had two aims. First, to assess if first impressions affect raters' judgments when workplace performance changes. Second, whether explicitly stating these impressions affects subsequent ratings compared to implicitly-formed first impressions. Physician raters viewed six videos where learner performance either changed (Strong to Weak or Weak to Strong) or remained consistent. Raters were assigned two groups. Group one (n = 23, Explicit) made a first impression global rating (FIGR), then scored learners using the Mini-CEX. Group two (n = 22, Implicit) scored learners at the end of the video solely with the Mini-CEX. For the Explicit group, in the Strong to Weak condition, the FIGR (M = 5.94) was higher than the Mini-CEX Global rating (GR) (M = 3.02, p < .001). In the Weak to Strong condition, the FIGR (M = 2.44) was lower than the Mini-CEX GR (M = 3.96 p < .001). There was no difference between the FIGR and the Mini-CEX GR in the consistent condition (M = 6.61, M = 6.65 respectively, p = .84). There were no statistically significant differences in any of the conditions when comparing both groups' Mini-CEX GR. Therefore, raters adjusted their judgments based on the learners' performances. Furthermore, raters who made their first impressions explicit showed similar rater bias to raters who followed a more naturalistic process.

4.

The Impact of Surgeon Experience on Script Concordance Test Scoring.

Gawad, Nada; Wood, Timothy J; Malvea, Anahita; Cowley, Lindsay; Raiche, Isabelle.

J Surg Res ; 265: 265-271, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-33964636

RESUMO

OBJECTIVE: The Script Concordance Test (SCT) is a test of clinical decision-making that relies on an expert panel to create its scoring key. Existing literature demonstrates the value of specialty-specific experts, but the effect of experience among the expert panel is unknown. The purpose of this study was to explore the role of surgeon experience in SCT scoring. DESIGN: An SCT was administered to 29 general surgery residents and 14 staff surgeons. Staff surgeons were stratified as either junior or senior experts based on years since completing residency training (<15 versus >25 years). The SCT was scored using the full expert panel, the senior panel, the junior panel, and a subgroup junior panel in practice <5 years. A one-way ANOVA was used to compare the scores of first (R1) and fifth (R5) year residents using each scoring scheme. Cognitive interviews were analyzed for differences between junior and senior expert panelist responses. RESULTS: There was no statistically significant difference between the mean score of six R1s and five R5s using the full expert panel (R1 69.08 versus R5 67.06, F1,9 = 0.10, P = 0.76), the junior panel (R1 66.73 versus R5 62.50, F1,9 = 0.35, P = 0.57), or the subgroup panel in practice <5 years (R1 61.07 versus R5 58.79, F1,9 = 0.18, P = 0.75). However, the average score of R1s was significantly lower than R5s when using the senior faculty panel (R1 52.04 versus R5 63.26, F1,9 = 26.90, P = 0.001). Cognitive interview data suggests that some responses of junior experts demonstrate less confidence than those of senior experts. CONCLUSIONS: SCT scores are significantly affected by the responses of the expert panel. Expert differences between first and fifth year residents were only demonstrated when using an expert panel consisting of senior faculty members. Confidence may play a role in the response selections of junior experts. When constructing an SCT expert panel, consideration must be given to the experience of panel members.

Assuntos

Competência Clínica , Tomada de Decisão Clínica/métodos , Cirurgiões/psicologia , Feminino , Humanos , Masculino

5.

How do cognitive processes influence script concordance test responses?

Gawad, Nada; Wood, Timothy J; Cowley, Lindsay; Raiche, Isabelle.

Med Educ ; 55(3): 354-364, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33185303

RESUMO

INTRODUCTION: The script concordance test (SCT) is a test of clinical decision-making (CDM) that compares the thought process of learners to that of experts to determine to what extent their cognitive 'scripts' align. Without understanding test-takers' cognitive process, however, it is unclear what influences their responses. The objective of this study was to gather response process validity evidence by studying the cognitive process of test-takers to determine whether the SCT tests CDM and what cognitive processes may influence SCT responses. METHODS: Cases from an SCT used in a national validation study were administered and semi-structured cognitive interviews were conducted with ten residents and five staff surgeons. A retrospective verbal probing technique was used. Data was independently analysed and coded by two analysts. Themes were identified as factors that influence SCT responses during the cognitive interview. RESULTS: Cognitive interviews demonstrated variability in CDM among test-takers. Consistent with dual process theory, test-takers relied on scripts formed through past experiences, when available, to make decisions and used conscious deliberation in the absence of experience. However, test-takers' response process was also influenced by their comprehension of specific terms, desire for additional information, disagreement with the planned management, underlying knowledge gaps and desire to demonstrate confidence or humility. CONCLUSION: The rationale behind SCT answers may be influenced by comprehension, underlying knowledge and social desirability in addition to formed scripts and/or conscious deliberation. Having test-takers verbalise their rationale for responses provides a depth of assessment that is otherwise lost in the SCT's current format. With the improved ability to standardise CDM assessment using the SCT, consideration of test-makers improving the SCT construction process and combining the SCT question format with verbal responses may improve the use of the SCT for CDM assessment.

Assuntos

Competência Clínica , Avaliação Educacional , Tomada de Decisão Clínica , Cognição , Humanos , Estudos Retrospectivos

6.

Are raters influenced by prior information about a learner? A review of assimilation and contrast effects in assessment.

Humphrey-Murto, Susan; Shaw, Tammy; Touchie, Claire; Pugh, Debra; Cowley, Lindsay; Wood, Timothy J.

Adv Health Sci Educ Theory Pract ; 26(3): 1133-1156, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-33566199

RESUMO

Understanding which factors can impact rater judgments in assessments is important to ensure quality ratings. One such factor is whether prior performance information (PPI) about learners influences subsequent decision making. The information can be acquired directly, when the rater sees the same learner, or different learners over multiple performances, or indirectly, when the rater is provided with external information about the same learner prior to rating a performance (i.e., learner handover). The purpose of this narrative review was to summarize and highlight key concepts from multiple disciplines regarding the influence of PPI on subsequent ratings, discuss implications for assessment and provide a common conceptualization to inform research. Key findings include (a) assimilation (rater judgments are biased towards the PPI) occurs with indirect PPI and contrast (rater judgments are biased away from the PPI) with direct PPI; (b) negative PPI appears to have a greater effect than positive PPI; (c) when viewing multiple performances, context effects of indirect PPI appear to diminish over time; and (d) context effects may occur with any level of target performance. Furthermore, some raters are not susceptible to context effects, but it is unclear what factors are predictive. Rater expertise and training do not consistently reduce effects. Making raters more accountable, providing specific standards and reducing rater cognitive load may reduce context effects. Theoretical explanations for these findings will be discussed.

Assuntos

Competência Clínica , Avaliação Educacional , Humanos , Julgamento , Variações Dependentes do Observador , Pesquisadores

7.

How biased are you? The effect of prior performance information on attending physician ratings and implications for learner handover.

Shaw, Tammy; Wood, Timothy J; Touchie, Claire; Pugh, Debra; Humphrey-Murto, Susan M.

Adv Health Sci Educ Theory Pract ; 26(1): 199-214, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-32577927

RESUMO

Learner handover (LH), the process of sharing of information about learners between faculty supervisors, allows for longitudinal assessment fundamental in the competency-based education model. However, the potential to bias future assessments has been raised as a concern. The purpose of this study is to determine whether prior performance information such as LH influences the assessment of learners in the clinical context. Between December 2017 and June 2018, forty-two faculty members and final-year residents from the Department of Medicine at the University of Ottawa were assigned to one of three study groups through quasi-randomisation, taking into account gender, speciality and rater experience. In a counter-balanced design, each group received either positive, negative or no LH prior to watching six simulated learner-patient encounter videos. Participants rated each video using the mini-CEX and completed a questionnaire on the raters' general impressions of LH. A significant difference in the mean mini-CEX competency scale scores between the negative (M = 5.29) and positive (M = 5.97) LH groups (P < .001, d = 0.81) was noted. Similar findings were found for the single overall clinical competence ratings. In the post-study questionnaire, 22/28 (78%) of participants had correctly deduced the purpose of the study and 14/28 (50%) felt LH did not influence their assessment. LH influenced mini-CEX scores despite raters' awareness of the potential for bias. These results suggest that LH could influence a rater's performance assessment and careful consideration of the potential implications of LH is required.

Assuntos

Competência Clínica/normas , Avaliação Educacional/normas , Internato e Residência/organização & administração , Variações Dependentes do Observador , Adulto , Canadá , Educação Baseada em Competências , Avaliação Educacional/métodos , Feminino , Humanos , Internato e Residência/normas , Masculino , Pessoa de Meia-Idade , Fatores Sexuais

8.

The cognitive process of test takers when using the script concordance test rating scale.

Gawad, Nada; Wood, Timothy J; Cowley, Lindsay; Raiche, Isabelle.

Med Educ ; 54(4): 337-347, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-31912562

RESUMO

CONTEXT: Clinical decision making (CDM) skills are important to learn and assess in order to establish competence in trainees. A common tool for assessing CDM is the script concordance test (SCT), which asks test takers to indicate how a new clinical finding influences a proposed plan using a Likert-type scale. Most criticisms of the SCT relate to its rating scale but are largely theoretical. The cognitive process of test takers when selecting their responses using the SCT rating scale remains understudied, but is essential to gathering validity evidence for use of the SCT in CDM assessment. METHODS: Cases from an SCT used in a national validation study were administered to 29 residents and 14 staff surgeons. Semi-structured cognitive interviews were then conducted with 10 residents and five staff surgeons based on the SCT results. Cognitive interview data were independently coded by two data analysts, who specifically sought to elucidate how participants mapped their internally generated responses to any of the rating scale options. RESULTS: Five major issues were identified with the response matching cognitive process: (a) the meaning of the '0' response option; (b) which response corresponds to agreement with the planned management; (c) the rationale for picking '±1' versus '±2'; (d) which response indicates the desire to undertake the planned management plus an additional procedure, and (e) the influence of time on response selection. CONCLUSIONS: Studying how test takers (experts and trainees) interpret the SCT rating scale has revealed several issues related to inconsistent and unintended use. Revising the scale to address the variety of interpretations could help to improve the response process validity of the SCT and therefore improve the SCT's ability to be used in CDM skills assessments.

Assuntos

Competência Clínica , Cognição , Avaliação Educacional , Habilidades para Realização de Testes , Adulto , Tomada de Decisão Clínica , Educação de Pós-Graduação em Medicina , Feminino , Humanos , Entrevistas como Assunto , Masculino , Estudos Prospectivos , Pesquisa Qualitativa

9.

Are rating scales really better than checklists for measuring increasing levels of expertise?

Wood, Timothy J; Pugh, Debra.

Med Teach ; 42(1): 46-51, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31429366

RESUMO

Background: It is a doctrine that OSCE checklists are not sensitive to increasing levels of expertise whereas rating scales are. This claim is based primarily on a study that used two psychiatry stations and it is not clear to what degree the finding generalizes to other clinical contexts. The purpose of our study was to reexamine the relationship between increasing training and scoring instruments within an OSCE.Approach: A 9-station OSCE progress test was administered to Internal Medicine residents in post-graduate years (PGY) 1-4. Residents were scored using checklists and rating scales. Standard scores from three administrations (27 stations) were analyzed.Findings: Only one station produced a result in which checklist scores did not increase as a function of training level, but the rating scales did. For 13 stations, scores increased as a function of PGY equally for both checklists and rating scales.Conclusion: Checklist scores were as sensitive to the level of training as rating scales for most stations, suggesting that checklists can capture increasing levels of expertise. The choice of which measure is used should be based on the purpose of the examination and not on a belief that one measure can better capture increases in expertise.

Assuntos

Lista de Checagem/métodos , Competência Clínica , Avaliação Educacional/métodos , Medicina Interna/educação , Humanos , Internato e Residência , Ontário , Reprodutibilidade dos Testes

10.

Comparison of the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) to a Single-Item Performance Score.

Saliken, David; Dudek, Nancy; Wood, Timothy J; MacEwan, Matthew; Gofton, Wade T.

Teach Learn Med ; 31(2): 146-153, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30514128

RESUMO

Construct: We compared a single-item performance score with the Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) for their ability in assessing surgical competency. BACKGROUND: Surgical programs are adopting competency-based frameworks. The adoption of these frameworks for assessment requires tools that produce accurate and valid assessments of knowledge and technical performance. An assessment tool that is quick to complete could improve feasibility, reduce delays, and result in a higher volume of assessments of learners. Previous work demonstrated that the 9-item O-SCORE can produce valid results; the goal of this study was to determine if a single-item performance rating (Is candidate competent to independently complete procedure: yes or no) completed at a separate viewing would correlate to the O-SCORE, thus increasing feasibility of procedural competence assessment. APPROACH: Nineteen residents and 2 staff orthopedic surgeons from the University of Ottawa volunteered for a 2-part OSCE-style station including a written questionnaire and videotaped simulated open reduction and internal fixation midshaft radius fracture. Each performance was rated independently by 3 orthopedic surgeons using a single-item performance score (Time 1). The performances were assessed again 6 weeks later by the 3 raters using the O-SCORE (Time 2). Correlation between the single-item performance score and the O-SCORE were evaluated. RESULTS: Three orthopedic surgeons completed 21 ratings each resulting in 63 orthopedic ratings. There was a high level of correlation and agreement between the single-item performance score at Time 1 and Time 2 (κ correlation =0.72-1.00; p < .001; percentage agreement =90%-100%). The reliability of the O-SCORE at Time 2 with three raters was 0.83 and the internal consistency was 0.89. There was a tendency for each rater to assign more yes responses to the more senior trainees. CONCLUSIONS: A single-item performance score correlated highly with the O-SCORE in an orthopedic setting. A single-item score could be used to supplement a multi-item score with similar results in orthopedics. There is still benefit in completing multi-item scores such as the O-SCORE evaluations to guide specific areas of improvement and direct feedback.

Assuntos

Lista de Checagem , Competência Clínica/normas , Avaliação Educacional/métodos , Cirurgia Geral/educação , Canadá , Humanos

11.

Can physician examiners overcome their first impression when examinee performance changes?

Wood, Timothy J; Pugh, Debra; Touchie, Claire; Chan, James; Humphrey-Murto, Susan.

Adv Health Sci Educ Theory Pract ; 23(4): 721-732, 2018 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-29556923

RESUMO

There is an increasing focus on factors that influence the variability of rater-based judgments. First impressions are one such factor. First impressions are judgments about people that are made quickly and are based on little information. Under some circumstances, these judgments can be predictive of subsequent decisions. A concern for both examinees and test administrators is whether the relationship remains stable when the performance of the examinee changes. That is, once a first impression is formed, to what degree will an examiner be willing to modify it? The purpose of this study is to determine the degree that first impressions influence final ratings when the performance of examinees changes within the context of an objective structured clinical examination (OSCE). Physician examiners (n = 29) viewed seven videos of examinees (i.e., actors) performing a physical exam on a single OSCE station. They rated the examinees' clinical abilities on a six-point global rating scale after 60 s (first impression or FIGR). They then observed the examinee for the remainder of the station and provided a final global rating (GRS). For three of the videos, the examinees' performance remained consistent throughout the videos. For two videos, examinee performance changed from initially strong to weak and for two videos, performance changed from initially weak to strong. The mean FIGR rating for the Consistent condition (M = 4.80) and the Strong to Weak condition (M = 4.87) were higher compared to their respective GRS ratings (M = 3.93, M = 2.73) with a greater decline for the Strong to Weak condition. The mean FIGR rating for the Weak to Strong condition was lower (3.60) than the corresponding mean GRS (4.81). This pattern of findings suggests that raters were willing to change their judgments based on examinee performance. Future work should explore the impact of making a first impression judgment explicit versus implicit and the role of context on the relationship between a first impression and a subsequent judgment.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Avaliação Educacional/normas , Variações Dependentes do Observador , Adulto , Feminino , Humanos , Julgamento , Masculino , Pessoa de Meia-Idade , Fatores Socioeconômicos

12.

Assessing the Validity of a Multidisciplinary Mini-Clinical Evaluation Exercise.

Humphrey-Murto, Susan; Côté, Mylène; Pugh, Debra; Wood, Timothy J.

Teach Learn Med ; 30(2): 152-161, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29240463

RESUMO

Construct: The purpose of this study was to provide validity evidence for the mini-clinical evaluation exercise (mini-CEX) as an assessment tool for clinical skills in the workplace. BACKGROUND: Previous research has demonstrated validity evidence for the mini-CEX, but most studies were carried out in internal medicine or single disciplines, therefore limiting generalizability of the findings. If the mini-CEX is to be used in multidisciplinary contexts, then validity evidence should be gathered in similar settings. The purpose of this study was to gather further validity evidence for the mini-CEX but in a broader context. Specifically we sought to explore the effects of discipline and rater type on mini-CEX scores, internal structure, and the relationship between mini-CEXs and OSCEs in a multidisciplinary context. APPROACH: During clerkship, medical students completed eight different rotations (family medicine, internal medicine, surgery, psychiatry, pediatrics, emergency, anesthesiology and obstetrics and gynecology). During each rotation, mini-CEX forms and a written examination were completed. Two multidisciplinary OSCEs (in Clerkship Year 3 and start of Year 4) assessed clinical skills. The reliability of the mini-CEX was assessed using Generalizability analyses. To assess the influence of discipline and rater type, mean scores were analyzed using a factorial analysis of variance. The total mini-CEX score was correlated to scores from the students' respective OSCEs and corresponding written exams. RESULTS: Eighty-two students met inclusion criteria for a total of 781 ratings (average of 9.82 mini-CEX forms per student). There was a significant effect of discipline (p < .001, = .16), and faculty provided lower scores than nonfaculty raters (7.12 vs. 7.41; p = .002, = .02). The g-coefficient was .53 when discipline was included as a facet and .23 when rater type was a facet. There were low, but statistically significant correlations between the mini-CEX and scores for the 4th-year OSCE Total Score and the OSCE communication scores, r(80) = .40, p < .001 and r(80) = .29, p = .009. The mini-CEX was not correlated with the written examination scores for any of the disciplines. CONCLUSIONS: Our results provide conflicting findings for validity evidence for the mini-CEX. Mini-CEX ratings were correlated to multidisciplinary OSCEs but not written examinations, supporting the validity argument. However, reliability of the mini-CEX was low to moderate, and error accounted for the greatest amount of variability in scores. There was variation in scores due to discipline and resident raters gave higher scores than faculty. These results should be considered when considering the use of the mini-CEX in different contexts.

Assuntos

Estágio Clínico , Competência Clínica/normas , Comunicação Interdisciplinar , Medicina Interna/educação , Canadá , Humanos

13.

Supervisor-trainee continuity and the quality of work-based assessments.

Cheung, Warren J; Dudek, Nancy L; Wood, Timothy J; Frank, Jason R.

Med Educ ; 51(12): 1260-1268, 2017 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-28971502

RESUMO

CONTEXT: Work-based assessments (WBAs) represent an increasingly important means of reporting expert judgements of trainee competence in clinical practice. However, the quality of WBAs completed by clinical supervisors is of concern. The episodic and fragmented interaction that often occurs between supervisors and trainees has been proposed as a barrier to the completion of high-quality WBAs. OBJECTIVES: The primary purpose of this study was to determine the effect of supervisor-trainee continuity on the quality of assessments documented on daily encounter cards (DECs), a common form of WBA. The relationship between trainee performance and DEC quality was also examined. METHODS: Daily encounter cards representing three differing degrees of supervisor-trainee continuity (low, intermediate, high) were scored by two raters using the Completed Clinical Evaluation Report Rating (CCERR), a previously published nine-item quantitative measure of DEC quality. An analysis of variance (anova) was performed to compare mean CCERR scores among the three groups. Linear regression analysis was conducted to examine the relationship between resident performance and DEC quality. RESULTS: Differences in mean CCERR scores were observed between the three continuity groups (p = 0.02); however, the magnitude of the absolute differences was small (partial eta-squared = 0.03) and not educationally meaningful. Linear regression analysis demonstrated a significant inverse relationship between resident performance and CCERR score (p < 0.001, r2 = 0.18). This inverse relationship was observed in both groups representing on-service residents (p = 0.001, r2 = 0.25; p = 0.04, r2 = 0.19), but not in the Off-service group (p = 0.62, r2 = 0.05). CONCLUSIONS: Supervisor-trainee continuity did not have an educationally meaningful influence on the quality of assessments documented on DECs. However, resident performance was found to affect assessor behaviours in the On-service group, whereas DEC quality remained poor regardless of performance in the Off-service group. The findings suggest that greater attention should be given to determining ways of improving the quality of assessments reported for off-service residents, as well as for those residents demonstrating appropriate clinical competence progression.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Docentes de Medicina , Internato e Residência , Educação de Pós-Graduação em Medicina/métodos , Medicina de Emergência/educação , Humanos , Reprodutibilidade dos Testes

14.

Comparing alternative and traditional dissemination metrics in medical education.

Amath, Aysah; Ambacher, Kristin; Leddy, John J; Wood, Timothy J; Ramnanan, Christopher J.

Med Educ ; 51(9): 935-941, 2017 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-28719136

RESUMO

CONTEXT: The impact of academic scholarship has traditionally been measured using citation-based metrics. However, citations may not be the only measure of impact. In recent years, other platforms (e.g. Twitter) have provided new tools for promoting scholarship to both academic and non-academic audiences. Alternative metrics (altmetrics) can capture non-traditional dissemination data such as attention generated on social media platforms. OBJECTIVES: The aims of this exploratory study were to characterise the relationships among altmetrics, access counts and citations in an international and pre-eminent medical education journal, and to clarify the roles of these metrics in assessing the impact of medical education academic scholarship. METHODS: A database study was performed (September 2015) for all papers published in Medical Education in 2012 (n = 236) and 2013 (n = 246). Citation, altmetric and access (HTML views and PDF downloads) data were obtained from Scopus, the Altmetric Bookmarklet tool and the journal Medical Education, respectively. Pearson coefficients (r-values) between metrics of interest were then determined. RESULTS: Twitter and Mendeley (an academic bibliography tool) were the only altmetric-tracked platforms frequently (> 50%) utilised in the dissemination of articles. Altmetric scores (composite measures of all online attention) were driven by Twitter mentions. For short and full-length articles in 2012 and 2013, both access counts and citation counts were most strongly correlated with one another, as well as with Mendeley downloads. By comparison, Twitter metrics and altmetric scores demonstrated weak to moderate correlations with both access and citation counts. CONCLUSIONS: Whereas most altmetrics showed limited correlations with readership (access counts) and impact (citations), Mendeley downloads correlated strongly with both readership and impact indices for articles published in the journal Medical Education and may therefore have potential use that is complementary to that of citations in assessment of the impact of medical education scholarship.

Assuntos

Bibliometria , Educação Médica , Fator de Impacto de Revistas , Publicações Periódicas como Assunto , Mídias Sociais/estatística & dados numéricos , Bases de Dados Factuais , Humanos

15.

The influence of first impressions on subsequent ratings within an OSCE station.

Wood, Timothy J; Chan, James; Humphrey-Murto, Susan; Pugh, Debra; Touchie, Claire.

Adv Health Sci Educ Theory Pract ; 22(4): 969-983, 2017 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-27848171

RESUMO

Competency-based assessment is placing increasing emphasis on the direct observation of learners. For this process to produce valid results, it is important that raters provide quality judgments that are accurate. Unfortunately, the quality of these judgments is variable and the roles of factors that influence the accuracy of those judgments are not clearly understood. One such factor is first impressions: that is, judgments about people we do not know, made quickly and based on very little information. This study explores the influence of first impressions in an OSCE. Specifically, the purpose is to begin to examine the accuracy of a first impression and its influence on subsequent ratings. We created six videotapes of history-taking performance. Each video was scripted from a real performance by six examinee residents within a single OSCE station. Each performance was re-enacted with six different actors playing the role of the examinees and one actor playing the role of the patient and videotaped. A total of 23 raters (i.e., physician examiners) reviewed each video and were asked to make a global judgment of the examinee's clinical abilities after 60 s (First Impression GR) by providing a rating on a six-point global rating scale and then to rate their confidence in the accuracy of that judgment by providing a rating on a five-point rating scale (Confidence GR). After making these ratings, raters then watched the remainder of the examinee's performance and made another global rating of performance (Final GR) before moving on to the next video. First impression ratings of ability varied across examinees and were moderately correlated to expert ratings (r = .59, 95% CI [-.13, .90]). There were significant differences in mean ratings for three examinees. Correlations ranged from .05 to .56 but were only significant for three examinees. Rater confidence in their first impression was not related to the likelihood of a rater changing their rating between the first impression and a subsequent rating. The findings suggest that first impressions could play a role in explaining variability in judgments, but their importance was determined by the videotaped performance of the examinees. More work is needed to clarify conditions that support or discourage the use of first impressions.

Assuntos

Educação Médica/métodos , Avaliação Educacional/métodos , Avaliação Educacional/normas , Docentes de Medicina/psicologia , Competência Clínica/normas , Educação Médica/normas , Docentes de Medicina/normas , Humanos , Anamnese/normas , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Gravação de Videoteipe

16.

Using consensus group methods such as Delphi and Nominal Group in medical education research.

Humphrey-Murto, Susan; Varpio, Lara; Gonsalves, Carol; Wood, Timothy J.

Med Teach ; 39(1): 14-19, 2017 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-27841062

RESUMO

Consensus group methods are widely used in research to identify and measure areas where incomplete evidence exists for decision-making. Despite their widespread use, these methods are often inconsistently used and reported. Using examples from the three most commonly used methods, the Delphi, Nominal Group and RAND/UCLA; this paper and associated Guide aim to describe these methods and to highlight common weaknesses in methodology and reporting. The paper outlines a series of recommendations to assist researchers using consensus group methods in providing a comprehensive description and justification of the steps taken in their study.

Assuntos

Consenso , Técnica Delphi , Educação Médica/organização & administração , Projetos de Pesquisa , Processos Grupais , Humanos

17.

Do OSCE progress test scores predict performance in a national high-stakes examination?

Pugh, Debra; Bhanji, Farhan; Cole, Gary; Dupre, Jonathan; Hatala, Rose; Humphrey-Murto, Susan; Touchie, Claire; Wood, Timothy J.

Med Educ ; 50(3): 351-8, 2016 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-26896020

RESUMO

CONTEXT: Progress tests, in which learners are repeatedly assessed on equivalent content at different times in their training and provided with feedback, would seem to lend themselves well to a competency-based framework, which requires more frequent formative assessments. The objective structured clinical examination (OSCE) progress test is a relatively new form of assessment that is used to assess the progression of clinical skills. The purpose of this study was to establish further evidence for the use of an OSCE progress test by demonstrating an association between scores from this assessment method and those from a national high-stakes examination. METHODS: The results of 8 years' of data from an Internal Medicine Residency OSCE (IM-OSCE) progress test were compared with scores on the Royal College of Physicians and Surgeons of Canada Comprehensive Objective Examination in Internal Medicine (RCPSC IM examination), which is comprised of both a written and performance-based component (n = 180). Correlations between scores in the two examinations were calculated. Logistic regression analyses were performed comparing IM-OSCE progress test scores with an 'elevated risk of failure' on either component of the RCPSC IM examination. RESULTS: Correlations between scores from the IM-OSCE (for PGY-1 residents to PGY-4 residents) and those from the RCPSC IM examination ranged from 0.316 (p = 0.001) to 0.554 (<.001) for the performance-based component and 0.305 (p = 0.002) to 0.516 (p < 0.001) for the written component. Logistic regression models demonstrated that PGY-2 and PGY-4 scores from the IM-OSCE were predictive of an 'elevated risk of failure' on both components of the RCPSC IM examination. CONCLUSIONS: This study provides further evidence for the use of OSCE progress testing by demonstrating a correlation between scores from an OSCE progress test and a national high-stakes examination. Furthermore, there is evidence that OSCE progress test scores are predictive of future performance on a national high-stakes examination.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Internato e Residência/normas , Licenciamento em Medicina , Canadá , Medicina Interna/educação

18.

The OSCE progress test--Measuring clinical skill development over residency training.

Pugh, Debra; Touchie, Claire; Humphrey-Murto, Susan; Wood, Timothy J.

Med Teach ; 38(2): 168-73, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-25909896

RESUMO

PURPOSE: The purpose of this study was to explore the use of an objective structured clinical examination for Internal Medicine residents (IM-OSCE) as a progress test for clinical skills. METHODS: Data from eight administrations of an IM-OSCE were analyzed retrospectively. Data were scaled to a mean of 500 and standard deviation (SD) of 100. A time-based comparison, treating post-graduate year (PGY) as a repeated-measures factor, was used to determine how residents' performance progressed over time. RESULTS: Residents' total IM-OSCE scores (n = 244) increased over training from a mean of 445 (SD = 84) in PGY-1 to 534 (SD = 71) in PGY-3 (p < 0.001). In an analysis of sub-scores, including only those who participated in the IM OSCE for all three years of training (n = 46), mean structured oral scores increased from 464 (SD = 92) to 533 (SD = 83) (p < 0.001), physical examination scores increased from 464 (SD = 82) to 520 (SD = 75) (p < 0.001), and procedural skills increased from 495 (SD = 99) to 555 (SD = 67) (p = 0.033). There was no significant change in communication scores (p = 0.97). CONCLUSIONS: The IM-OSCE can be used to demonstrate progression of clinical skills throughout residency training. Although most of the clinical skills assessed improved as residents progressed through their training, communication skills did not appear to change.

Assuntos

Competência Clínica/normas , Avaliação Educacional/métodos , Medicina Interna/educação , Internato e Residência , Humanos , Ontário , Estudos Retrospectivos

19.

Done or Almost Done? Improving OSCE Checklists to Better Capture Performance in Progress Tests.

Pugh, Debra; Halman, Samantha; Desjardins, Isabelle; Humphrey-Murto, Susan; Wood, Timothy J.

Teach Learn Med ; 28(4): 406-414, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27700252

RESUMO

Construct: The impact of using nonbinary checklists for scoring residents from different levels of training participating in objective structured clinical examination (OSCE) progress tests was explored. BACKGROUND: OSCE progress tests typically employ similar rating instruments as traditional OSCEs. However, progress tests differ from other assessment modalities because learners from different stages of training participate in the same examination, which can pose challenges when deciding how to assign scores. In an attempt to better capture performance, nonbinary checklists were introduced in two OSCE progress tests. The purposes of this study were (a) to identify differences in the use of checklist options (e.g., done satisfactorily, attempted, or not done) by task type, (b) to analyze the impact of different scoring methods using nonbinary checklists for two OSCE progress tests (nonprocedural and procedural) for Internal Medicine residents, and (c) to determine which scoring method is better suited for a given task. APPROACH: A retrospective analysis examined differences in scores (n = 119) for two OSCE progress tests (procedural and nonprocedural). Scoring methods (hawk, dove, and hybrid) varied in stringency in how they awarded marks for nonbinary checklist items that were rated as done satisfactorily, attempted, or not done. Difficulty, reliability (internal consistency), item-total correlations and pass rates were compared for each OSCE using the three scoring methods. RESULTS: Mean OSCE scores were highest using the dove method and lowest using the hawk method. The hawk method resulted in higher item-total correlations for most stations, but there were differences by task type. Overall score reliability calculated using the three methods did not differ significantly. Pass-fail status differed as a function of scoring methods and exam type, with the hawk and hybrid methods resulting in higher failure rates for the nonprocedural OSCE and the dove method resulting in a higher failure rate for the procedural OSCE. CONCLUSION: The use of different scoring methods for nonbinary OSCE checklists resulted in differences in mean scores and pass-fail status. The results varied with procedural and nonprocedural OSCEs.

Assuntos

Lista de Checagem , Competência Clínica , Avaliação Educacional , Humanos , Reprodutibilidade dos Testes , Estudos Retrospectivos

20.

Continued Validation of the O-SCORE (Ottawa Surgical Competency Operating Room Evaluation): Use in the Simulated Environment.

MacEwan, Matthew J; Dudek, Nancy L; Wood, Timothy J; Gofton, Wade T.

Teach Learn Med ; 28(1): 72-9, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26787087

RESUMO

UNLABELLED: CONSTRUCT: The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE) is a 9-item surgical evaluation tool designed to assess technical competence in surgical trainees using behavioral anchors. BACKGROUND: The initial development of the O-SCORE produced evidence for valid results. Further work is required to determine if the use of a single surgeon or an unblinded rater introduces bias. In addition, the relationship of the O-SCORE to other currently used technical assessment tools should be explored to provide validity evidence related to the relationship to other measures. We have designed this project to provide continued validity evidence for the O-SCORE related to these two issues. APPROACH: Nineteen residents and 2 staff Orthopedic Surgeons from the University of Ottawa volunteered to participate in a 2-part OSCE style station. Participants completed a written questionnaire followed by a videotaped 10-minute simulated open reduction and internal fixation of a midshaft radius fracture. Videos were rated individually by 2 blinded staff orthopedic surgeons using an Objective Structured Assessment of Technical Skills (OSATS) global rating scale, an OSATS checklist, and the O-SCORE in random order. RESULTS: O-SCORE results appeared sensitive to surgical training level even when raters were blinded. In addition, strong agreement between two independent observers using the O-SCORE suggests that the measure captures a performance easily recognized by surgical observers. Ratings on the O-SCORE also were strongly associated with global ratings on the currently most validated technical evaluation tool (OSATS). Collectively, these results suggest that the O-SCORE generates accurate, reproducible, and meaningful results when used in a randomized and blinded fashion, providing continued validity evidence for using this tool to evaluate surgical trainee competence. CONCLUSIONS: The O-SCORE was able to differentiate surgical trainee level using blinded raters providing further evidence of validity for the O-SCORE. There was strong agreement between two independent observers using the O-SCORE. Ratings on the O-SCORE also demonstrated equivalence to scores on the most validated technical evaluation tool (OSATS). These results suggest that the O-SCORE demonstrates accurate and reproducible results when used in a randomized and blinded fashion providing continued validity evidence for this tool in the evaluation of surgical competence in the trainees.

Assuntos

Lista de Checagem/normas , Competência Clínica/normas , Salas Cirúrgicas , Treinamento por Simulação , Feminino , Humanos , Internato e Residência , Masculino , Ortopedia , Cirurgiões , Inquéritos e Questionários

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA