Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters

Database
Language
Affiliation country
Publication year range
1.
Can Urol Assoc J ; 18(4): 116-119, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38381940

ABSTRACT

INTRODUCTION: The Objective Structured Clinical Examination (OSCE) is an attractive tool of competency assessment in a high-stakes summative exam. An advantage of the OSCE is the ability to assess more realistic context, content, and procedures. Each year, the Queen's Urology Exam Skills Training (QUEST) is attended by graduating Canadian urology residents to simulate their upcoming board exams. The exam consists of a written component and an OSCE. The aim of this study was to determine the inter-observer consistency of scoring between two examiners of an OSCE for a given candidate. METHODS: Thirty-nine participants in 2020 and 37 participants in 2021 completed four stations of OSCEs virtually over the Zoom platform. Each candidate was examined and scored independently by two different faculty urologists in a blinded fashion at each station. The OSCE scoring consisted of a checklist rating scale for each question. An intra-class correlation (ICC) analysis was conducted to determine the inter-rater reliability of the two examiners for each of the four OSCE stations in both the 2020 and 2021 OSCEs. RESULTS: For the 2020 data, the prostate cancer station scores were most strongly correlated (ICC 0.746, 95% confidence interval [CI] 0.556-0.862, p<0.001). This was followed by the general urology station (ICC 0.688, 95% CI 0.464-0.829, p<0.001), the urinary incontinence station (ICC 0.638, 95% CI 0.403-0.794, p<0.001), and finally the nephrolithiasis station (ICC 0.472, 95% CI 0.183-0.686, p<0.001). For the 2021 data, the renal cancer station had the highest ICC at 0.866 (95% CI 0.754-0.930, p<0.001). This was followed by the nephrolithiasis station (ICC 0.817, 95% CI 0.673-0.901, p<0.001), the pediatric station (ICC 0.809, 95% CI 0.660-0.897, p<0.001), and finally the andrology station (ICC 0.804, 95% CI 0.649-0.895, p<0.001). A Pearson correlation coefficient was calculated for all stations, and all show a positive correlation with global exam scores. It is noteworthy that some stations were more predictive of overall performance, but this did not necessarily mean better ICC scores for these stations. CONCLUSIONS: Given a specific clinical scenario in an OSCE exam, inter-rater reliability of scoring can be compromised on occasion. Care should be taken when high-stakes decisions about promotion are made based on OSCEs with limited standardization.

2.
Can Urol Assoc J ; 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38896484

ABSTRACT

INTRODUCTION: Generative artificial intelligence (AI) has proven to be a powerful tool with increasing applications in clinical care and medical education. CHATGPT has performed adequately on many specialty certification and knowledge assessment exams. The objective of this study was to assess the performance of CHATGPT 4 on a multiple-choice exam meant to simulate the Canadian urology board exam. METHODS: Graduating urology residents representing all Canadian training programs gather yearly for a mock exam that simulates their upcoming board-certifying exam. The exam consists of written multiple-choice questions (MCQs) and an oral objective structured clinical examination (OSCE). The 2022 exam was taken by 29 graduating residents and was administered to CHATGPT 4. RESULTS: CHATGPT 4 scored 46% on the MCQ exam, whereas the mean and median scores of graduating urology residents were 62.6%, and 62.7%, respectively. This would place CHATGPT's score 1.8 standard deviations from the median. The percentile rank of CHATGPT would be in the sixth percentile. CHATGPT scores on different topics of the exam were as follows: oncology 35%, andrology/benign prostatic hyperplasia 62%, physiology/anatomy 67%, incontinence/female urology 23%, infections 71%, urolithiasis 57%, and trauma/reconstruction 17%, with ChatGPT 4's oncology performance being significantly below that of postgraduate year 5 residents. CONCLUSIONS: CHATGPT 4 underperforms on an MCQ exam meant to simulate the Canadian board exam. Ongoing assessments of the capability of generative AI is needed as these models evolve and are trained on additional urology content.

SELECTION OF CITATIONS
SEARCH DETAIL