|

1.

The Ottawa resident observation form for nurses (O-RON): evaluation of an assessment tool's psychometric properties in different specialties.

Chiu, Hedva; Wood, Timothy J; Garber, Adam; Halman, Samantha; Rekman, Janelle; Gofton, Wade; Dudek, Nancy.

BMC Med Educ ; 24(1): 487, 2024 May 02.

Article En | MEDLINE | ID: mdl-38698352

BACKGROUND: Workplace-based assessment (WBA) used in post-graduate medical education relies on physician supervisors' feedback. However, in a training environment where supervisors are unavailable to assess certain aspects of a resident's performance, nurses are well-positioned to do so. The Ottawa Resident Observation Form for Nurses (O-RON) was developed to capture nurses' assessment of trainee performance and results have demonstrated strong evidence for validity in Orthopedic Surgery. However, different clinical settings may impact a tool's performance. This project studied the use of the O-RON in three different specialties at the University of Ottawa. METHODS: O-RON forms were distributed on Internal Medicine, General Surgery, and Obstetrical wards at the University of Ottawa over nine months. Validity evidence related to quantitative data was collected. Exit interviews with nurse managers were performed and content was thematically analyzed. RESULTS: 179 O-RONs were completed on 30 residents. With four forms per resident, the ORON's reliability was 0.82. Global judgement response and frequency of concerns was correlated (r = 0.627, P < 0.001). CONCLUSIONS: Consistent with the original study, the findings demonstrated strong evidence for validity. However, the number of forms collected was less than expected. Exit interviews identified factors impacting form completion, which included clinical workloads and interprofessional dynamics.

Clinical Competence , Internship and Residency , Psychometrics , Humans , Reproducibility of Results , Female , Male , Educational Measurement/methods , Ontario , Internal Medicine/education

2.

The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam.

Takagi, Soshi; Koda, Masahide; Watari, Takashi.

JMIR Med Educ ; 10: e54283, 2024 May 23.

Article En | MEDLINE | ID: mdl-38787024

Licensure, Medical , Humans , Japan , Educational Measurement/methods , Educational Measurement/standards , Clinical Competence/standards , East Asian People

3.

Role of artificial intelligence (Google bard) in morphological, histopathological, and radiological image identifications: Objective Structured Practical Examination (OSPE) type-based performance.

Meo, Sultan A; AbuKhalaf, Abdulelah A; Meo, Muhammad Zain S; Meo, Muhammad Omair S; Ayub, Rashid; ElToukhy, Riham A; Usmani, Adnan M; Hajjar, Waseem M.

Saudi Med J ; 45(5): 531-536, 2024 May.

Article En | MEDLINE | ID: mdl-38734438

OBJECTIVES: To evaluate the role of artificial intelligence (Google Bard) in figures, scans, and image identifications and interpretations in medical education and healthcare sciences through an Objective Structured Practical Examination (OSPE) type of performance. METHODS: The OSPE type of question bank was created with a pool of medical sciences figures, scans, and images. For assessment, 60 figures, scans and images were selected and entered into the given area of the Google Bard to evaluate the knowledge level. RESULTS: The marks obtained by Google Bard in brain structures, morphological and radiological images 7/10 (70%); bone structures, radiological images 9/10 (90%); liver structure and morphological, pathological images 4/10 (40%); kidneys structure and morphological images 2/7 (28.57%); neuro-radiological images 4/7 (57.14%); and endocrine glands including the thyroid, pancreas, breast morphological and radiological images 8/16 (50%). The overall total marks obtained by Google Bard in various OSPE figures, scans, and image identification questions were 34/60 (56.7%). CONCLUSION: Google Bard scored satisfactorily in morphological, histopathological, and radiological image identifications and their interpretations. Google Bard may assist medical students, faculty in medical education and physicians in healthcare settings.

Artificial Intelligence , Humans , Education, Medical/methods , Educational Measurement/methods , Radiography/methods

4.

Determining the psychometric properties of a written test to assess safe dental practice.

Zaidi, Syed Akbar Abbas; Ilyas, Farnaz; Hakeem, Saman; Feroze, Asher; Sarfaraz, Shaur; Ali, Syeda Kausar.

BMJ Open Qual ; 13(Suppl 2)2024 May 07.

Article En | MEDLINE | ID: mdl-38719519

INTRODUCTION: Safe practice in medicine and dentistry has been a global priority area in which large knowledge gaps are present.Patient safety strategies aim at preventing unintended damage to patients that can be caused by healthcare practitioners. One of the components of patient safety is safe clinical practice. Patient safety efforts will help in ensuring safe dental practice for early detection and limiting non-preventable errors.A valid and reliable instrument is required to assess the knowledge of dental students regarding patient safety. OBJECTIVE: To determine the psychometric properties of a written test to assess safe dental practice in undergraduate dental students. MATERIAL AND METHODS: A test comprising 42 multiple-choice questions of one-best type was administered to final year students (52) of a private dental college. Items were developed according to National Board of Medical Examiners item writing guidelines. The content of the test was determined in consultation with dental experts (either professor or associate professor). These experts had to assess each item on the test for language clarity as A: clear, B: ambiguous and relevance as 1: essential, 2: useful, not necessary, 3: not essential. Ethical approval was taken from the concerned dental college. Statistical analysis was done in SPSS V.25 in which descriptive analysis, item analysis and Cronbach's alpha were measured. RESULT: The test scores had a reliability (calculated by Cronbach's alpha) of 0.722 before and 0.855 after removing 15 items. CONCLUSION: A reliable and valid test was developed which will help to assess the knowledge of dental students regarding safe dental practice. This can guide medical educationist to develop or improve patient safety curriculum to ensure safe dental practice.

Educational Measurement , Patient Safety , Psychometrics , Humans , Psychometrics/instrumentation , Psychometrics/methods , Patient Safety/standards , Patient Safety/statistics & numerical data , Surveys and Questionnaires , Educational Measurement/methods , Educational Measurement/statistics & numerical data , Educational Measurement/standards , Reproducibility of Results , Students, Dental/statistics & numerical data , Students, Dental/psychology , Education, Dental/methods , Education, Dental/standards , Male , Female , Clinical Competence/statistics & numerical data , Clinical Competence/standards

5.

Academic performance among pharmacy students using virtual vs. face-to-face team-based learning.

Shoair, Osama A.

Ann Med ; 56(1): 2349205, 2024 Dec.

Article En | MEDLINE | ID: mdl-38738408

INTRODUCTION: This study compares pharmacy students' performance using face-to-face (FTF) team-based learning (TBL) vs. virtual TBL across multiple courses and different academic levels while accounting for student demographic and academic factors. METHODS: The study included pharmacy students from different academic levels (P1-P3) who were enrolled in three didactic courses taught using FTF TBL and virtual TBL. Multiple generalized linear models (GLMs) were performed to compare students' performance on individual readiness assurance tests (iRATs), team readiness assurance tests (tRATs), team application exercises (tAPPs), summative exams, and total course scores using FTF TBL vs. virtual TBL, adjusting for students' age, sex, race, and cumulative grade point average (cGPA). RESULTS: The study involved a total of 356 pharmacy students distributed across different academic levels and learning modalities: P1 students [FTF TBL (n = 26), virtual TBL (n = 42)], P2 students [FTF TBL (n = 77), virtual TBL (n = 71)], and P3 students [FTF TBL (n = 65), virtual TBL (n = 75)]. In the P1 cohort, the virtual group had higher iRAT and tRAT scores but lower tAPP scores than the FTF TBL group, with no significant differences in summative exams or total course scores. For P2 students, the virtual TBL group had higher iRAT and tRAT scores but lower summative exam scores and total course scores than the FTF TBL group, with no significant differences in tAPP scores. In the P3 student group, the virtual TBL group had higher iRAT, tRAT, tAPP, summative exam, and total course scores than the FTF TBL group. CONCLUSIONS: Students' performance in virtual TBL vs. FTF TBL in the pharmacy didactic curriculum varies depending on the course content, academic year, and type of assessment.

Academic Performance , Education, Pharmacy , Educational Measurement , Students, Pharmacy , Humans , Students, Pharmacy/statistics & numerical data , Students, Pharmacy/psychology , Male , Female , Education, Pharmacy/methods , Academic Performance/statistics & numerical data , Educational Measurement/methods , Young Adult , Adult , Problem-Based Learning/methods , Curriculum

6.

The effects of language proficiency and awareness of time limit in animated vs. text-based situational judgment tests.

Rabe, Mathis; Gröne, Oana R; von Bernstorff, Charlotte; Knorr, Mirjana.

BMC Med Educ ; 24(1): 540, 2024 May 15.

Article En | MEDLINE | ID: mdl-38750433

BACKGROUND: Situational Judgment Tests (SJTs) are commonly used in medical school admissions. However, it has been consistently found that native speakers tend to score higher on SJTs than non-native speakers, which can be particularly problematic in the admission context due to the potential risk of limited fairness. Besides type of SJT, awareness of time limit may play a role in subgroup differences in the context of cognitive load theory. This study examined the influence of SJT type and awareness of time limit against the background of language proficiency in a quasi high-stakes setting. METHODS: Participants (N = 875), applicants and students in healthcare-related study programs, completed an online study that involved two SJTs: one with a text-based stimulus and response format (HAM-SJT) and another with a video-animated stimulus and media-supported response format (Social Shapes Test, SST). They were randomly assigned to a test condition in which they were either informed about a time limit or not. In a multilevel model analysis, we examined the main effects and interactions of the predictors (test type, language proficiency and awareness of time limit) on test performance (overall, response percentage). RESULTS: There were significant main effects on overall test performance for language proficiency in favor of native speakers and for awareness of time limit in favor of being aware of the time limit. Furthermore, an interaction between language proficiency and test type was found, indicating that subgroup differences are smaller for the animated SJT than for the text-based SJT. No interaction effects on overall test performance were found that included awareness of time limit. CONCLUSION: A SJT with video-animated stimuli and a media-supported response format can reduce subgroup differences in overall test performance between native and non-native speakers in a quasi high-stakes setting. Awareness of time limit is equally important for high and low performance, regardless of language proficiency or test type.

Judgment , Humans , Female , Male , Young Adult , Adult , Awareness , School Admission Criteria , Educational Measurement/methods , Language , Students, Medical/psychology , Schools, Medical

7.

Assessment by mini clinical evaluation exercise (Mini-CEX) in the intensive care unit.

Ben Ghezala, Hassen; Benzarti, Aida; Ben Jazia, Amira; Brahmi, Nozha.

Tunis Med ; 102(5): 272-277, 2024 May 05.

Article Fr | MEDLINE | ID: mdl-38801284

INTRODUCTION: Mini Clinical Evaluation Exercise (mini-CEX) is one of the assessment tools in medical education. It includes three steps: overview of clinical situation, observation and feedback. AIM: To evaluate the feasibility of mini-CEX as a formative assessment tool for medical trainees in 5th year of medicine in a teaching intensive care unit (ICU). METHODS: Single-center qualitative research conducted in ICU during the 2nd semester of the academic year 2022-2023. Seven core clinical skill assessments were done, and the performance was rated on a 9-point scale. An assessment of the method was conducted with both trainees and clinical educators. RESULTS: We conducted six mini-CEX recorded sessions. All medical students had marks under the average of 4.5. In the first period, the highest mark was obtained for counselling skills (4.5). The best score was obtained for clinical judgement (4) in the second period and for management plan (4) in the third period. Most of medical trainees (11 sur 12) were satisfied with the method and feedback was according to them the most useful step. Ten students agreed fully to introduce this assessment tool in medical educational programs. Two medical educators out of three did not practice this method before. They agreed to include mini-CEX in the program of medical education of the faculty of medicine of Tunis. However, they did not agree to use it as a summative assessment tool. CONCLUSION: Our study demonstrates that we can use the mini-CEX in medical teaching. Both trainees and educators were satisfied with the method.

Clinical Competence , Educational Measurement , Intensive Care Units , Students, Medical , Humans , Intensive Care Units/organization & administration , Educational Measurement/methods , Clinical Competence/standards , Students, Medical/statistics & numerical data , Education, Medical/methods , Education, Medical/organization & administration , Feasibility Studies , Qualitative Research , Tunisia

8.

The impact of repeated item development training on the prediction of medical faculty members' item difficulty index.

Lee, Hye Yoon; Yune, So Jung; Lee, Sang Yeoup; Im, Sunju; Kam, Bee Sung.

BMC Med Educ ; 24(1): 599, 2024 May 30.

Article En | MEDLINE | ID: mdl-38816855

BACKGROUND: Item difficulty plays a crucial role in assessing students' understanding of the concept being tested. The difficulty of each item needs to be carefully adjusted to ensure the achievement of the evaluation's objectives. Therefore, this study aimed to investigate whether repeated item development training for medical school faculty improves the accuracy of predicting item difficulty in multiple-choice questions. METHODS: A faculty development program was implemented to enhance the prediction of each item's difficulty index, ensure the absence of item defects, and maintain the general principles of item development. The interrater reliability between the predicted, actual, and corrected item difficulty was assessed before and after the training, using either the kappa index or the correlation coefficient, depending on the characteristics of the data. A total of 62 faculty members participated in the training. Their predictions of item difficulty were compared with the analysis results of 260 items taken by 119 fourth-year medical students in 2016 and 316 items taken by 125 fourth-year medical students in 2018. RESULTS: Before the training, significant agreement between the predicted and actual item difficulty indices was observed for only one medical subject, Cardiology (K = 0.106, P = 0.021). However, after the training, significant agreement was noted for four subjects: Internal Medicine (K = 0.092, P = 0.015), Cardiology (K = 0.318, P = 0.021), Neurology (K = 0.400, P = 0.043), and Preventive Medicine (r = 0.577, P = 0.039). Furthermore, a significant agreement was observed between the predicted and actual difficulty indices across all subjects when analyzing the average difficulty of all items (r = 0.144, P = 0.043). Regarding the actual difficulty index by subject, neurology exceeded the desired difficulty range of 0.45-0.75 in 2016. By 2018, however, all subjects fell within this range. CONCLUSION: Repeated item development training, which includes predicting each item's difficulty index, can enhance faculty members' ability to predict and adjust item difficulty accurately. To ensure that the difficulty of the examination aligns with its intended purpose, item development training can be beneficial. Further studies on faculty development are necessary to explore these benefits more comprehensively.

Educational Measurement , Faculty, Medical , Humans , Educational Measurement/methods , Reproducibility of Results , Students, Medical , Education, Medical, Undergraduate , Male , Female

9.

Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada.

Mousavi, Mehdi; Shafiee, Shabnam; Harley, Jason M; Cheung, Jackie Chi Kit; Abbasgholizadeh Rahimi, Samira.

Fam Med Community Health ; 12(Suppl 1)2024 May 28.

Article En | MEDLINE | ID: mdl-38806403

INTRODUCTION: The application of large language models such as generative pre-trained transformers (GPTs) has been promising in medical education, and its performance has been tested for different medical exams. This study aims to assess the performance of GPTs in responding to a set of sample questions of short-answer management problems (SAMPs) from the certification exam of the College of Family Physicians of Canada (CFPC). METHOD: Between August 8th and 25th, 2023, we used GPT-3.5 and GPT-4 in five rounds to answer a sample of 77 SAMPs questions from the CFPC website. Two independent certified family physician reviewers scored AI-generated responses twice: first, according to the CFPC answer key (ie, CFPC score), and second, based on their knowledge and other references (ie, Reviews' score). An ordinal logistic generalised estimating equations (GEE) model was applied to analyse repeated measures across the five rounds. RESULT: According to the CFPC answer key, 607 (73.6%) lines of answers by GPT-3.5 and 691 (81%) by GPT-4 were deemed accurate. Reviewer's scoring suggested that about 84% of the lines of answers provided by GPT-3.5 and 93% of GPT-4 were correct. The GEE analysis confirmed that over five rounds, the likelihood of achieving a higher CFPC Score Percentage for GPT-4 was 2.31 times more than GPT-3.5 (OR: 2.31; 95% CI: 1.53 to 3.47; p<0.001). Similarly, the Reviewers' Score percentage for responses provided by GPT-4 over 5 rounds were 2.23 times more likely to exceed those of GPT-3.5 (OR: 2.23; 95% CI: 1.22 to 4.06; p=0.009). Running the GPTs after a one week interval, regeneration of the prompt or using or not using the prompt did not significantly change the CFPC score percentage. CONCLUSION: In our study, we used GPT-3.5 and GPT-4 to answer complex, open-ended sample questions of the CFPC exam and showed that more than 70% of the answers were accurate, and GPT-4 outperformed GPT-3.5 in responding to the questions. Large language models such as GPTs seem promising for assisting candidates of the CFPC exam by providing potential answers. However, their use for family medicine education and exam preparation needs further studies.

Certification , Canada , Humans , Educational Measurement/methods , Physicians, Family/education , Clinical Competence , Family Practice/education

10.

Effects of virtual reality OSCE on nursing students' education: a study protocol for systematic review and meta-analysis.

Liu, Ping; Dong, Xuan; Liu, Fei; Fu, Haixia.

BMJ Open ; 14(5): e082847, 2024 May 28.

Article En | MEDLINE | ID: mdl-38806420

INTRODUCTION: Virtual objective structured clinical examination (OSCE) has been shown to influence the performance of nursing students. However, its specific effects, particularly students' competence, stress, anxiety, confidence, satisfaction with virtual reality OSCE and examiners' satisfaction, remain unclear. METHOD AND ANALYSIS: This study aims to assess the effects of virtual reality OSCE on nursing students' education. The study follows the Preferred Reporting Items for Systematic Review and Meta-Analysis Protocol guidelines. A literature search is performed on electronic databases, namely, PubMed, Web of Science, CINAHL, EBSCO, EMBASE and the Cochrane Library. The inclusion criteria adhere to the PICOS principle, encompassing nursing students, including those studying in school and those engaged in hospital internship. This review includes studies on the use of virtual reality OSCE as an assessment tool, compared with traditional clinical examinations, such as in-person OSCE. The outcome assessments encompass (1) competence, (2) stress, (3) anxiety, (4) confidence, (5) student satisfaction with virtual reality OSCE and (6) examiners' satisfaction. These studies are designed as randomised controlled trials (RCTs) or quasi-experimental research. The search time is from the inception of each database to 30 June 2023, without language restriction. Studies for inclusion are screened by two reviewers for data extraction dependently. Any dispute is resolved through discussion. Unresolved disputes are decided by consulting a third author. For the risk of bias (ROB) assessment, the Cochrane ROB tool for RCTs and the risk of bias in non-randomised studies of intervention tool are used. Moreover, RevMan V.5.3 is used for meta-analysis. ETHICS AND DISSEMINATION: This study protocol does not include any clinical research and thus does not require ethical approval. Research findings are published in a peer-reviewed journal. PROSPERO REGISTRATION NUMBER: CRD42023437685.

Clinical Competence , Educational Measurement , Meta-Analysis as Topic , Students, Nursing , Systematic Reviews as Topic , Virtual Reality , Humans , Educational Measurement/methods , Research Design , Anxiety , Education, Nursing/methods

11.

Gender differences in learning and study strategies impact medical students' preclinical and USMLE step 1 examination performance.

Saxena, Sparsha; Wright, William S; Khalil, Mohammed K.

BMC Med Educ ; 24(1): 504, 2024 May 07.

Article En | MEDLINE | ID: mdl-38714975

BACKGROUND: Evaluation of students' learning strategies can enhance academic support. Few studies have investigated differences in learning strategies between male and female students as well as their impact on United States Medical Licensing Examination® (USMLE) Step 1 and preclinical performance. METHODS: The Learning and Study Strategies Inventory (LASSI) was administered to the classes of 2019-2024 (female (n = 350) and male (n = 262)). Students' performance on preclinical first-year (M1) courses, preclinical second-year (M2) courses, and USMLE Step 1 was recorded. An independent t-test evaluated differences between females and males on each LASSI scale. A Pearson product moment correlation determined which LASSI scales correlated with preclinical performance and USMLE Step 1 examinations. RESULTS: Of the 10 LASSI scales, Anxiety, Attention, Information Processing, Selecting Main Idea, Test Strategies and Using Academic Resources showed significant differences between genders. Females reported higher levels of Anxiety (p < 0.001), which significantly influenced their performance. While males and females scored similarly in Concentration, Motivation, and Time Management, these scales were significant predictors of performance variation in females. Test Strategies was the largest contributor to performance variation for all students, regardless of gender. CONCLUSION: Gender differences in learning influence performance on STEP1. Consideration of this study's results will allow for targeted interventions for academic success.

Education, Medical, Undergraduate , Educational Measurement , Licensure, Medical , Students, Medical , Humans , Female , Male , Educational Measurement/methods , Education, Medical, Undergraduate/standards , Sex Factors , Licensure, Medical/standards , Learning , United States , Academic Performance , Young Adult

12.

The New American Board of Radiology Certifying Oral Examination: How Should Diagnostic Radiology Graduate Medical Education Evolve?

Mokkarala, Mahati; Bentley, Helena; Gomez, Christian; Jiao, Albert; Zaki-Metias, Kaitlin M.

Radiographics ; 44(6): e240016, 2024 Jun.

Article En | MEDLINE | ID: mdl-38722783

Certification , Education, Medical, Graduate , Radiology , Specialty Boards , United States , Radiology/education , Humans , Educational Measurement/methods , Clinical Competence

13.

Moving Beyond Hemangioma: Interactive, Multidisciplinary, Case-Based Teaching in Vascular Anomalies for Pediatric Residents.

Mojica, Ann Marie; Wolfe, Adam D.

MedEdPORTAL ; 20: 11401, 2024.

Article En | MEDLINE | ID: mdl-38716162

Introduction: Vascular anomalies are a spectrum of disorders, including vascular tumors and malformations, that often require multispecialty care. The rarity and variety of these lesions make diagnosis, treatment, and management challenging. Despite the recognition of the medical complexity and morbidity associated with vascular anomalies, there is a general lack of education on the subject for pediatric primary care and subspecialty providers. A needs assessment and the lack of an available standardized teaching tool presented an opportunity to create an educational workshop for pediatric trainees using the POGIL (process-oriented guided inquiry learning) framework. Methods: We developed a 2-hour workshop consisting of an introductory didactic followed by small- and large-group collaboration and case-based discussion. The resource included customizable content for learning assessment and evaluation. Residents completed pre- and posttest assessments of content and provided written evaluations of the teaching session. Results: Thirty-four learners in pediatrics participated in the workshop. Session evaluations were positive, with Likert responses of 4.6-4.8 out of 5 on all items. Pre- and posttest comparisons of four content questions showed no overall statistically significant changes in correct response rates. Learners indicated plans to use the clinical content in their practice and particularly appreciated the interactive teaching forum and the comprehensive overview of vascular anomalies. Discussion: Vascular anomalies are complex, potentially morbid, and often lifelong conditions; multispecialty collaboration is key to providing comprehensive care for affected patients. This customizable resource offers a framework for trainees in pediatrics to appropriately recognize, evaluate, and refer patients with vascular anomalies.

Hemangioma , Internship and Residency , Pediatrics , Vascular Malformations , Humans , Pediatrics/education , Pediatrics/methods , Internship and Residency/methods , Vascular Malformations/diagnosis , Hemangioma/diagnosis , Teaching , Problem-Based Learning/methods , Educational Measurement/methods , Education, Medical, Graduate/methods , Curriculum

14.

Assessment Checklist: Assuring Quality in Student Learning Outcome Assessment.

Herrmann, Tracy.

Radiol Technol ; 95(5): 382-387, 2024 May.

Article En | MEDLINE | ID: mdl-38719556

Checklist , Educational Measurement , Technology, Radiologic , Humans , Educational Measurement/methods , Technology, Radiologic/education

15.

Task Oriented Assessment of Clinical Skills (TOACS): A Modified Form of OSCE.

Shaikh, Sirajul Haque; Memon, Shabnam Iqbal; Abbasi, Sahar Zubair.

J Coll Physicians Surg Pak ; 34(5): 614-616, 2024 May.

Article En | MEDLINE | ID: mdl-38720226

College of Physicians and Surgeons, Pakistan (CPSP) is a premier postgraduate medical institution of the country. It introduced Objective Structured Clinical Examination (OSCE) in the 1990s, and later came up with its modified form known as Task Oriented Assessment of Clinical Skills (TOACS). This modified assessment has been incorporated in clinical examinations of its majority fellowship programmes. Despite the use of TOACS for so many years at CPSP, it is surprising to note that this form of assessment does not appear in the literature. The objective of this viewpoint is to describe the rationale for the development of TOACS and to compare its structure and functions with OSCE. Key Words: Medical education, Assessment, Objective Structured Clinical Examination, Interactive, Task Oriented Assessment of Clinical Skills.

Clinical Competence , Educational Measurement , Humans , Educational Measurement/methods , Pakistan , Education, Medical, Graduate/methods

16.

The Revival of Essay-Type Questions in Medical Education: Harnessing Artificial Intelligence and Machine Learning.

Shamim, Muhammad Shahid; Zaidi, Syed Jaffar Abbas; Rehman, Abdur.

J Coll Physicians Surg Pak ; 34(5): 595-599, 2024 May.

Article En | MEDLINE | ID: mdl-38720222

OBJECTIVE: To analyse and compare the assessment and grading of human-written and machine-written formative essays. STUDY DESIGN: Quasi-experimental, qualitative cross-sectional study. Place and Duration of the Study: Department of Science of Dental Materials, Hamdard College of Medicine & Dentistry, Hamdard University, Karachi, from February to April 2023. METHODOLOGY: Ten short formative essays of final-year dental students were manually assessed and graded. These essays were then graded using ChatGPT version 3.5. The chatbot responses and prompts were recorded and matched with manually graded essays. Qualitative analysis of the chatbot responses was then performed. RESULTS: Four different prompts were given to the artificial intelligence (AI) driven platform of ChatGPT to grade the summative essays. These were the chatbot's initial responses without grading, the chatbot's response to grading against criteria, the chatbot's response to criteria-wise grading, and the chatbot's response to questions for the difference in grading. Based on the results, four innovative ways of using AI and machine learning (ML) have been proposed for medical educators: Automated grading, content analysis, plagiarism detection, and formative assessment. ChatGPT provided a comprehensive report with feedback on writing skills, as opposed to manual grading of essays. CONCLUSION: The chatbot's responses were fascinating and thought-provoking. AI and ML technologies can potentially supplement human grading in the assessment of essays. Medical educators need to embrace AI and ML technology to enhance the standards and quality of medical education, particularly when assessing long and short essay-type questions. Further empirical research and evaluation are needed to confirm their effectiveness. KEY WORDS: Machine learning, Artificial intelligence, Essays, ChatGPT, Formative assessment.

Artificial Intelligence , Educational Measurement , Machine Learning , Humans , Cross-Sectional Studies , Educational Measurement/methods , Pakistan , Education, Medical/methods , Students, Dental/psychology , Writing , Qualitative Research , Education, Dental/methods

17.

Evidence of learning in workplace-based assessments in a Family Medicine Training Programme.

Erumeda, Neetha J; George, Ann Z; Jenkins, Louis S.

S Afr Fam Pract (2004) ; 66(1): e1-e15, 2024 Apr 26.

Article En | MEDLINE | ID: mdl-38708750

BACKGROUND: Learning portfolios (LPs) provide evidence of workplace-based assessments (WPBAs) in clinical settings. The educational impact of LPs has been explored in high-income countries, but the use of portfolios and the types of assessments used for and of learning have not been adequately researched in sub-Saharan Africa. This study investigated the evidence of learning in registrars' LPs and the influence of the training district and year of training on assessments. METHODS: A cross-sectional study evaluated 18 Family Medicine registrars' portfolios from study years 1-3 across five decentralised training sites affiliated with the University of the Witwatersrand. Descriptive statistics were calculated for the portfolio and quarterly assessment (QA) scores and self-reported clinical skills competence levels. The competence levels obtained from the portfolios and university records served as proxy measures for registrars' knowledge and skills. RESULTS: The total LP median scores ranged from 59.9 to 81.0, and QAs median scores from 61.4 to 67.3 across training years. The total LP median scores ranged from 62.1 to 83.5 and 62.0 to 67.5, respectively in QAs across training districts. Registrars' competence levels across skill sets did not meet the required standards. Higher skills competence levels were reported in the women's health, child health, emergency care, clinical administration and teaching and learning domains. CONCLUSION: The training district and training year influence workplace-based assessment (WPBA) effectiveness. Ongoing faculty development and registrar support are essential for WPBA.Contribution: This study contributes to the ongoing discussion of how to utilise WPBA in resource-constrained sub-Saharan settings.

Clinical Competence , Educational Measurement , Family Practice , Workplace , Humans , Cross-Sectional Studies , Family Practice/education , Educational Measurement/methods , Female , Male , South Africa , Learning , Adult

18.

Machine Learning to Identify Clusters in Family Medicine Diplomate Motivations and Their Relationship to Continuing Certification Exam Outcomes: Findings and Potential Future Implications.

Price, David W; Wingrove, Peter; Bazemore, Andrew.

J Am Board Fam Med ; 37(2): 279-289, 2024.

Article En | MEDLINE | ID: mdl-38740475

BACKGROUND: The potential for machine learning (ML) to enhance the efficiency of medical specialty boards has not been explored. We applied unsupervised ML to identify archetypes among American Board of Family Medicine (ABFM) Diplomates regarding their practice characteristics and motivations for participating in continuing certification, then examined associations between motivation patterns and key recertification outcomes. METHODS: Diplomates responding to the 2017 to 2021 ABFM Family Medicine continuing certification examination surveys selected motivations for choosing to continue certification. We used Chi-squared tests to examine difference proportions of Diplomates failing their first recertification examination attempt who endorsed different motivations for maintaining certification. Unsupervised ML techniques were applied to generate clusters of physicians with similar practice characteristics and motivations for recertifying. Controlling for physician demographic variables, we used logistic regression to examine the effect of motivation clusters on recertification examination success and validated the ML clusters by comparison with a previously created classification schema developed by experts. RESULTS: ML clusters largely recapitulated the intrinsic/extrinsic framework devised by experts previously. However, the identified clusters achieved a more equal partitioning of Diplomates into homogenous groups. In both ML and human clusters, physicians with mainly extrinsic or mixed motivations had lower rates of examination failure than those who were intrinsically motivated. DISCUSSION: This study demonstrates the feasibility of using ML to supplement and enhance human interpretation of board certification data. We discuss implications of this demonstration study for the interaction between specialty boards and physician Diplomates.

Certification , Family Practice , Machine Learning , Motivation , Specialty Boards , Humans , Family Practice/education , Male , Female , United States , Adult , Education, Medical, Continuing , Middle Aged , Surveys and Questionnaires , Educational Measurement/methods , Educational Measurement/statistics & numerical data , Clinical Competence

19.

The reliability of the College of Intensive Care Medicine of Australia and New Zealand "Hot Case" examination.

Hoffman, Kenneth R; Swanson, David; Lane, Stuart; Nickson, Chris; Brand, Paul; Ryan, Anna T.

BMC Med Educ ; 24(1): 527, 2024 May 11.

Article En | MEDLINE | ID: mdl-38734603

BACKGROUND: High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners. METHODS: This retrospective, de-identified analysis of CICM examination data used descriptive statistics and generalisability theory to evaluate the reliability of the Hot Case examination component. Decision studies were used to project generalisability coefficients for alternate examination designs. RESULTS: Examination results from 2019 to 2022 included 592 Hot Cases, totalling 1184 individual examiner scores. The mean examiner Hot Case score was 5.17 (standard deviation 1.65). The correlation between candidates' two Hot Case scores was low (0.30). The overall reliability coefficient for the Hot Case component consisting of two cases observed by two separate pairs of examiners was 0.42. Sources of variance included candidate proficiency (25%), case difficulty and case specificity (63.4%), examiner stringency (3.5%) and other error (8.2%). To achieve a reliability coefficient of > 0.8 a candidate would need to perform 11 Hot Cases observed by two examiners. CONCLUSION: The reliability coefficient for the Hot Case component of the CICM second part examination is below the generally accepted value for a high stakes examination. Modifications to case selection and introduction of a clear scoring rubric to mitigate the effects of variation in case difficulty may be helpful. Increasing the number of cases and overall assessment time appears to be the best way to increase the overall reliability. Further research is required to assess the combined reliability of the Hot Case and viva components.

Clinical Competence , Critical Care , Educational Measurement , Humans , New Zealand , Australia , Reproducibility of Results , Retrospective Studies , Critical Care/standards , Educational Measurement/methods , Education, Medical, Graduate/standards

20.

Designing an evaluation tool for evaluating training programs of medical students in clinical skill training center from consumers' perspective.

Azad, Rezvan; Shakour, Mahsa; Moharami, Narjes.

BMC Med Educ ; 24(1): 502, 2024 May 09.

Article En | MEDLINE | ID: mdl-38724925

INTRODUCTION: The Clinical Skill Training Center (CSTC) is the first environment where third year medical students learn clinical skills after passing basic science. Consumer- based evaluation is one of the ways to improve this center with the consumer. This study was conducted with the aim of preparing a consumer-oriented evaluation tool for CSTC among medical students. METHOD: The study was mixed method. The first phase was qualitative and for providing an evaluation tool. The second phase was for evaluating the tool. At the first phase, after literature review in the Divergent phase, a complete list of problems in the field of CSTC in medicine schools was prepared. In the convergent step, the prepared list was compared with the standards of clinical education and values of scriven. In the second phase it was evaluated by the scientific and authority committee. Validity has been measured by determining CVR and CVI: Index. The face and content validity of the tool was obtained through the approval of a group of specialists. RESULTS: The findings of the research were in the form of 4 questionnaires: clinical instructors, pre-clinical medical students, and interns. All items were designed as a 5-point Likert. The main areas of evaluation included the objectives and content of training courses, implementation of operations, facilities and equipment, and the environment and indoor space. In order to examine the long-term effects, a special evaluation form was designed for intern. CONCLUSION: The tool for consumer evaluation was designed with good reliability and trustworthiness and suitable for use in the CSTC, and its use can improve the effectiveness of clinical education activities.

Clinical Competence , Program Evaluation , Students, Medical , Humans , Clinical Competence/standards , Education, Medical, Undergraduate/standards , Surveys and Questionnaires , Educational Measurement/methods