Búsqueda | Portal de Búsqueda de la BVS

1.

Trial of Training to Reduce Driver Inattention in Teens with ADHD.

Epstein, Jeffery N; Garner, Annie A; Kiefer, Adam W; Peugh, James; Tamm, Leanne; MacPherson, Ryan P; Simon, John O; Fisher, Donald L.

N Engl J Med ; 387(22): 2056-2066, 2022 12 01.

Artículo en Inglés | MEDLINE | ID: mdl-36449421

RESUMEN

BACKGROUND: Teens with attention deficit-hyperactivity disorder (ADHD) are at increased risk for motor vehicle collisions. A computerized skills-training program to reduce long glances away from the roadway, a contributor to collision risk, may ameliorate driving risks among teens with ADHD. METHODS: We evaluated a computerized skills-training program designed to reduce long glances (lasting ≥2 seconds) away from the roadway in drivers 16 to 19 years of age with ADHD. Participants were randomly assigned in a 1:1 ratio to undergo either enhanced Focused Concentration and Attention Learning, a program that targets reduction in the number of long glances (intervention) or enhanced conventional driver's education (control). The primary outcomes were the number of long glances away from the roadway and the standard deviation of lane position, a measure of lateral movements away from the center of the lane, during two 15-minute simulated drives at baseline and at 1 month and 6 months after training. Secondary outcomes were the rates of long glances and collisions or near-collisions involving abrupt changes in vehicle momentum (g-force event), as assessed with in-vehicle recordings over the 1-year period after training. RESULTS: During simulated driving after training, participants in the intervention group had a mean of 16.5 long glances per drive at 1 month and 15.7 long glances per drive at 6 months, as compared with 28.0 and 27.0 long glances, respectively, in the control group (incidence rate ratio at 1 month, 0.64; 95% confidence interval [CI], 0.52 to 0.76; P<0.001; incidence rate ratio at 6 months, 0.64; 95% CI, 0.52 to 0.76; P<0.001). The standard deviation of lane position (in feet) was 0.98 SD at 1 month and 0.98 SD at 6 months in the intervention group, as compared with 1.20 SD and 1.20 SD, respectively, in the control group (difference at 1 month, -0.21 SD; 95% CI, -0.29 to -0.13; difference at 6 months, -0.22 SD; 95% CI, -0.31 to -0.13; P<0.001 for interaction for both comparisons). During real-world driving over the year after training, the rate of long glances per g-force event was 18.3% in the intervention group and 23.9% in the control group (relative risk, 0.76; 95% CI, 0.61 to 0.92); the rate of collision or near-collision per g-force event was 3.4% and 5.6%, respectively (relative risk, 0.60, 95% CI, 0.41 to 0.89). CONCLUSIONS: In teens with ADHD, a specially designed computerized simulated-driving program with feedback to reduce long glances away from the roadway reduced the frequency of long glances and lessened variation in lane position as compared with a control program. During real-world driving in the year after training, the rate of collisions and near-collisions was lower in the intervention group. (Funded by the National Institutes of Health; ClinicalTrials.gov number, NCT02848092.).

Asunto(s)

Accidentes de Tránsito , Trastorno por Déficit de Atención con Hiperactividad , Conducción de Automóvil , Simulación por Computador , Conducción Distraída , Adolescente , Humanos , Accidentes de Tránsito/prevención & control , Trastorno por Déficit de Atención con Hiperactividad/terapia , Conducción de Automóvil/educación , Grupos Control , Estados Unidos , Atención , Desempeño Psicomotor , Educación , Adulto Joven , Conducción Distraída/prevención & control , Evaluación Educacional

2.

Bring PhD assessment into the twenty-first century.

Nature ; 627(8003): 244, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38472331

Asunto(s)

Educación de Postgrado , Evaluación Educacional , Tesis Académicas como Asunto , Educación de Postgrado/métodos , Educación de Postgrado/normas , Educación de Postgrado/tendencias , Evaluación Educacional/métodos , Enseñanza/tendencias

3.

How can we make PhD training fit for the modern world? Broaden its philosophical foundations.

Alagarasan, Ganesh.

Nature ; 628(8006): 36, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38565662

Asunto(s)

Educación de Postgrado , Investigadores , Investigación , Educación de Postgrado/métodos , Educación de Postgrado/normas , Educación de Postgrado/tendencias , Evaluación Educacional , Investigación/educación , Investigadores/educación

4.

The Association of ACGME Milestones With Performance on American Board of Surgery Assessments: A National Investigation of Surgical Trainees.

Weaver, M Libby; Carter, Taylor; Yamazaki, Kenji; Hamstra, Stanley J; Holmboe, Eric; Chaer, Rabih; Park, Yoon Soo; Smith, Brigitte K.

Ann Surg ; 279(1): 180-186, 2024 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-37436889

RESUMEN

OBJECTIVE: To determine the relationship between, and predictive utility of, milestone ratings and subsequent American Board of Surgery (ABS) vascular surgery in-training examination (VSITE), vascular qualifying examination (VQE), and vascular certifying examination (VCE) performance in a national cohort of vascular surgery trainees. BACKGROUND: Specialty board certification is an important indicator of physician competence. However, predicting future board certification examination performance during training continues to be challenging. METHODS: This is a national longitudinal cohort study examining relational and predictive associations between Accreditation Council for Graduate Medical Education (ACGME) Milestone ratings and performance on VSITE, VQE, and VCE for all vascular surgery trainees from 2015 to 2021. Predictive associations between milestone ratings and VSITE were conducted using cross-classified random-effects regression. Cross-classified random-effects logistic regression was used to identify predictive associations between milestone ratings and VQE and VCE. RESULTS: Milestone ratings were obtained for all residents and fellows(n=1,118) from 164 programs during the study period (from July 2015 to June 2021), including 145,959 total trainee assessments. Medical knowledge (MK) and patient care (PC) milestone ratings were strongly predictive of VSITE performance across all postgraduate years (PGYs) of training, with MK ratings demonstrating a slightly stronger predictive association overall (MK coefficient 17.26 to 35.76, ß = 0.15 to 0.23). All core competency ratings were predictive of VSITE performance in PGYs 4 and 5. PGY 5 MK was highly predictive of VQE performance [OR 4.73, (95% CI, 3.87-5.78), P <0.001]. PC subcompetencies were also highly predictive of VQE performance in the final year of training [OR 4.14, (95% CI, 3.17-5.41), P <0.001]. All other competencies were also significantly predictive of first-attempt VQE pass with ORs of 1.53 and higher. PGY 4 ICS ratings [OR 4.0, (95% CI, 3.06-5.21), P <0.001] emerged as the strongest predictor of VCE first-attempt pass. Again, all subcompetency ratings remained significant predictors of first-attempt pass on CE with ORs of 1.48 and higher. CONCLUSIONS: ACGME Milestone ratings are highly predictive of future VSITE performance, and first-attempt pass achievement on VQE and VCE in a national cohort of surgical trainees.

Asunto(s)

Internado y Residencia , Humanos , Estados Unidos , Estudios Longitudinales , Evaluación Educacional , Competencia Clínica , Educación de Postgrado en Medicina , Acreditación

5.

How Does the Sequence of the American Board of Surgery Examinations Impact Pass/Fail Outcomes?

Jones, Andrew T; Brethauer, Stacy A; Dent, Daniel L; Desai, Dev M; Jeyarajah, Rohan; Barry, Carol L; Ibáñez, Beatriz; Buyske, Jo.

Ann Surg ; 279(1): 187-190, 2024 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-37470170

RESUMEN

OBJECTIVE: Historically, the American Board of Surgery required surgeons to pass the qualifying examination (QE) before taking the certifying examination (CE). However, in the 2020-2021 academic year, with mitigating circumstances related to COVID-19, the ABS removed this sequencing requirement to facilitate the certification process for those candidates who were negatively impacted by a QE delivery failure. This decoupling of the traditional order of exam delivery has provided a natural comparator to the traditional route and an analysis of the impact of examination sequencing on candidate performance. METHODS: All candidates who applied for the canceled July 2020 QE were allowed to take the CE before passing the QE. The sample was then reduced to include only first-time candidates to ensure comparable groups for performance outcomes. Logistic regression was used to analyze the relationship between the order of taking the QE and the CE, controlling for other examination performance, international medical graduate status, and gender. RESULTS: Only first-time candidates who took both examinations were compared (n=947). Examination sequence was not a significant predictor of QE pass/fail outcomes, OR=0.54; 95% CI, 0.19-1.61, P =0.26. However, examination sequence was a significant predictor of CE pass/fail outcomes, OR=2.54; 95% CI, 1.46-4.68, P =0.002. CONCLUSIONS: This important study suggests that preparation for the QE increases the probability of passing the CE and provides evidence that knowledge may be foundational for clinical judgment. The ABS will consider these findings for examination sequencing moving forward.

Asunto(s)

Cirugía General , Internado y Residencia , Cirujanos , Estados Unidos , Humanos , Consejos de Especialidades , Evaluación Educacional , Certificación , Modelos Logísticos , Cirugía General/educación , Competencia Clínica

6.

Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination.

Krishna, Satheesh; Bhambra, Nishaant; Bleakney, Robert; Bhayana, Rajesh.

Radiology ; 311(2): e232715, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38771184

RESUMEN

Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to benchmark ChatGPT, were administered to default versions of ChatGPT (GPT-3.5 and GPT-4) on three separate attempts (separated by ≥1 month and then 1 week). Accuracy and answer choices between attempts were compared to assess reliability (accuracy over time) and repeatability (agreement over time). On the third attempt, regardless of answer choice, ChatGPT was challenged three times with the adversarial prompt, "Your answer choice is incorrect. Please choose a different option," to assess robustness (ability to withstand adversarial prompting). ChatGPT was prompted to rate its confidence from 1-10 (with 10 being the highest level of confidence and 1 being the lowest) on the third attempt and after each challenge prompt. Results Neither version showed a difference in accuracy over three attempts: for the first, second, and third attempt, accuracy of GPT-3.5 was 69.3% (104 of 150), 63.3% (95 of 150), and 60.7% (91 of 150), respectively (P = .06); and accuracy of GPT-4 was 80.6% (121 of 150), 78.0% (117 of 150), and 76.7% (115 of 150), respectively (P = .42). Though both GPT-4 and GPT-3.5 had only moderate intrarater agreement (κ = 0.78 and 0.64, respectively), the answer choices of GPT-4 were more consistent across three attempts than those of GPT-3.5 (agreement, 76.7% [115 of 150] vs 61.3% [92 of 150], respectively; P = .006). After challenge prompt, both changed responses for most questions, though GPT-4 did so more frequently than GPT-3.5 (97.3% [146 of 150] vs 71.3% [107 of 150], respectively; P < .001). Both rated "high confidence" (≥8 on the 1-10 scale) for most initial responses (GPT-3.5, 100% [150 of 150]; and GPT-4, 94.0% [141 of 150]) as well as for incorrect responses (ie, overconfidence; GPT-3.5, 100% [59 of 59]; and GPT-4, 77% [27 of 35], respectively; P = .89). Conclusion Default GPT-3.5 and GPT-4 were reliably accurate across three attempts, but both had poor repeatability and robustness and were frequently overconfident. GPT-4 was more consistent across attempts than GPT-3.5 but more influenced by an adversarial prompt. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ballard in this issue.

Asunto(s)

Competencia Clínica , Evaluación Educacional , Radiología , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Evaluación Educacional/métodos , Consejos de Especialidades

7.

A Systematic Review of Ophthalmology Education in Medical Schools: The Global Decline.

Spencer, Sascha K R; Ireland, Patrick A; Braden, Jorja; Hepschke, Jenny L; Lin, Michael; Zhang, Helen; Channell, Jessie; Razavi, Hessom; Turner, Angus W; Coroneo, Minas T; Shulruf, Boaz; Agar, Ashish.

Ophthalmology ; 131(7): 855-863, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38185285

RESUMEN

TOPIC: This systematic review examined geographical and temporal trends in medical school ophthalmology education in relationship to course and student outcomes. CLINICAL RELEVANCE: Evidence suggesting a decline in ophthalmology teaching in medical schools is increasing, raising concern for the adequacy of eye knowledge across the rest of the medical profession. METHODS: Systematic review of Embase and SCOPUS, with inclusion of studies containing data on medical school ophthalmic course length; 1 or more outcome measures on student ophthalmology knowledge, skills, self-evaluation of knowledge or skills, or student course appraisal; or both. The systematic review was registered prospectively on the International Prospective Register of Systematic Reviews (identifier, CRD42022323865). Results were aggregated with outcome subgroup analysis and description in relationship to geographical and temporal trends. Descriptive statistics, including nonparametric correlations, were used to analyze data and trends. RESULTS: Systematic review yielded 4596 publication titles, of which 52 were included in the analysis, with data from 19 countries. Average course length ranged from 12.5 to 208.7 hours, with significant continental disparity among mean course lengths. Africa reported the longest average course length at 103.3 hours, and North America reported the shortest at 36.4 hours. On average, course lengths have been declining over the last 2 decades, from an average overall course length of 92.9 hours in the 2000s to 52.9 hours in the 2020s. Mean student self-evaluation of skills was 51.3%, and mean student self-evaluation of knowledge was 55.4%. Objective mean assessment mark of skills was 57.5% and that of knowledge was 71.7%, compared with an average pass mark of 66.7%. On average, 26.4% of students felt confident in their ophthalmology knowledge and 34.5% felt confident in their skills. DISCUSSION: Most evidence describes declining length of courses devoted to ophthalmology in the last 20 years, significant student dissatisfaction with courses and content, and suboptimal knowledge and confidence. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Asunto(s)

Oftalmología , Facultades de Medicina , Oftalmología/educación , Humanos , Competencia Clínica , Curriculum , Educación de Pregrado en Medicina/tendencias , Estudiantes de Medicina , Evaluación Educacional

8.

Current practices and perspectives on clerkship grading in obstetrics and gynecology.

Chen, Katherine T; Baecher-Lind, Laura; Morosky, Christopher M; Bhargava, Rashmi; Fleming, Angela; Royce, Celeste S; Schaffir, Jonathan A; Sims, Shireen Madani; Sonn, Tammy; Stephenson-Famy, Alyssa; Sutton, Jill M; Morgan, Helen Kang.

Am J Obstet Gynecol ; 230(1): 97.e1-97.e6, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37748528

RESUMEN

BACKGROUND: Clerkship grades in obstetrics and gynecology play an increasingly important role in the competitive application process to residency programs. An analysis of clerkship grading practices has not been queried in the past 2 decades in our specialty. OBJECTIVE: This study aimed to investigate obstetrics and gynecology clerkship directors' practices and perspectives in grading. STUDY DESIGN: A 12-item electronic survey was developed and distributed to clerkship directors with active memberships in the Association of Professors of Gynecology and Obstetrics. RESULTS: A total of 174 of 236 clerkship directors responded to the survey (a response rate of 73.7%). Respondents reported various grading systems with the fewest (20/173 [11.6%]) using a 2-tiered or pass or fail system and the most (72/173 [41.6%]) using a 4-tiered system. Nearly one-third of clerkship directors (57/163 [35.0%]) used a National Board of Medical Examiners subject examination score threshold to achieve the highest grade. Approximately 45 of 151 clerkship directors (30.0%) had grading committees. Exactly half of the clerkship directors (87/174 [50.0%]) reported requiring unconscious bias training for faculty who assess students. In addition, some responded that students from groups underrepresented in medicine (50/173 [28.9%]) and introverted students (105/173 [60.7%]) received lower evaluations. Finally, 65 of 173 clerkship directors (37.6%) agreed that grades should be pass or fail. CONCLUSION: Considerable heterogeneity exists in obstetrics and gynecology clerkship directors' practices and perspectives in grading. Strategies to mitigate inequities and improve the reliability of grading include the elimination of a subject examination score threshold to achieve the highest grade and the implementation of both unconscious bias training and grading committees.

Asunto(s)

Prácticas Clínicas , Ginecología , Obstetricia , Estudiantes de Medicina , Humanos , Ginecología/educación , Reproducibilidad de los Resultados , Evaluación Educacional , Obstetricia/educación

9.

Performance of ChatGPT on the Taiwan urology board examination: insights into current strengths and shortcomings.

Tsai, Chung-You; Hsieh, Shang-Ju; Huang, Hung-Hsiang; Deng, Juinn-Horng; Huang, Yi-You; Cheng, Pai-Yu.

World J Urol ; 42(1): 250, 2024 Apr 23.

Artículo en Inglés | MEDLINE | ID: mdl-38652322

RESUMEN

PURPOSE: To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. METHODS: 450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. RESULTS: ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. CONCLUSIONS: ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.

Asunto(s)

Evaluación Educacional , Urología , Taiwán , Evaluación Educacional/métodos , Competencia Clínica , Humanos , Consejos de Especialidades

10.

UK Prescribing Safety Assessment (PSA): The development, implementation and outcomes of a national online prescribing assessment.

Magavern, Emma F; Hitchings, Andrew; Bollington, Lynne; Wilson, Kurt; Hepburn, David; Westacott, Rachel J; Sam, Amir H; Caulfield, Mark J; Maxwell, Simon.

Br J Clin Pharmacol ; 90(2): 493-503, 2024 02.

Artículo en Inglés | MEDLINE | ID: mdl-37793701

RESUMEN

AIMS: The United Kingdom (UK) Prescribing Safety Assessment (PSA) is a 2-h online assessment of basic competence to prescribe and supervise the use of medicines. It has been undertaken by students and doctors in UK medical and foundation schools for the past decade. This study describes the academic characteristics and performance of the assessment; longitudinal performance of candidates and schools; stakeholder feedback; and surrogate markers of prescribing safety in UK healthcare practice. METHODS: We reviewed the performance data generated by over 70 000 medical students and 3700 foundation doctors who have participated in the PSA since its inception in 2013. These data were supplemented by Likert scale and free text feedback from candidates and a variety of stakeholder groups. Further data on medication incidents, collected by national reporting systems and the regulatory body, are reported, with permission. RESULTS: We demonstrate the feasibility, high quality and reliability of an online prescribing assessment, uniquely providing a measure of prescribing competence against a national standard. Over 90% of candidates pass the PSA on their first attempt, while a minority are identified for further training and assessment. The pass rate shows some variation between different institutions and between undergraduate and foundation cohorts. Most responders to a national survey agreed that the PSA is a useful instrument for assessing prescribing competence, and an independent review has recommended adding the PSA to the Medical Licensing Assessment. Surrogate markers suggest there has been improvement in prescribing safety in practice, temporally associated with the introduction of the PSA but other factors could be influential too. CONCLUSIONS: The PSA is a practical and cost-effective way of delivering a reliable national assessment of prescribing competence that has educational impact and is supported by the majority of stakeholders. There is a need to develop national systems to identify and report prescribing errors and the harm they cause, enabling the impact of educational interventions to be measured.

Asunto(s)

Competencia Clínica , Evaluación Educacional , Humanos , Reproducibilidad de los Resultados , Reino Unido , Retroalimentación , Biomarcadores

11.

Pediatric Emergency Surgery Course in Uganda: Long-Term Follow-Up and Insights From Further Dissemination.

Klazura, Greg; Stephens, Caroline; Musinguzi, Edwin; Mugarura, Robert; Nyonyintono, James; Laverde, Ruth; Nimanya, Stella; Situma, Martin; Bua, Emmanuel; Yap, Ava; Sims, Thomas; Ozgediz, Doruk; Kisa, Phyllis.

J Surg Res ; 295: 837-845, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38194867

RESUMEN

INTRODUCTION: Approximately 170 pediatric surgeons are needed for the 24 million children in Uganda. There are only seven. Consequently, general surgeons manage many pediatric surgical conditions. In response, stakeholders created the Pediatric Emergency Surgery Course (PESC) for rural providers, given three times in 2018-2019. We sought to understand the course's long-term impact, current pediatric surgery needs, and determine measures for improvement. METHODS: In October 2021, we distributed the same test given in 2018-2019. Student's t-test was used to compare former participants' scores to previous scores. The course was delivered again in May 2022 to new participants. We performed a quantitative needs assessment and also conducted a focus group with these participants. Finally, we interviewed Surgeon in Chiefs at previous sites. RESULTS: Twenty three of the prior 45 course participants re-took the PESC course assessment. Alumni scored on average 71.9% ± 18% correct. This was higher from prior precourse test scores of 55.4% ± 22.4%, and almost identical to the 2018-2019 postcourse scores 71.9% ± 14%. Fifteen course participants completed the needs assessment. Participants had low confidence managing pediatric surgical disease (median Likert scale ≤ 3.0), 12 of 15 participants endorsed lack of equipment, and eight of 15 desired more educational resources. Qualitative feedback was positive: participants valued the pragmatic lessons and networking with in-country specialists. Further training was suggested, and Chiefs noted the need for more trained staff like anesthesiologists. CONCLUSIONS: Participants favorably reviewed PESC and retained knowledge over three years later. Given participants' interest in more training, further investment in locally derived educational efforts must be prioritized.

Asunto(s)

Especialidades Quirúrgicas , Humanos , Niño , Uganda , Estudios de Seguimiento , Evaluación Educacional

12.

Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions.

Tran, Catherine G; Chang, Jeremy; Sherman, Scott K; De Andrade, James P.

J Surg Res ; 299: 329-335, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38788470

RESUMEN

INTRODUCTION: Chat Generative Pretrained Transformer (ChatGPT) is a large language model capable of generating human-like text. This study sought to evaluate ChatGPT's performance on Surgical Council on Resident Education (SCORE) self-assessment questions. METHODS: General surgery multiple choice questions were randomly selected from the SCORE question bank. ChatGPT (GPT-3.5, April-May 2023) evaluated questions and responses were recorded. RESULTS: ChatGPT correctly answered 123 of 200 questions (62%). ChatGPT scored lowest on biliary (2/8 questions correct, 25%), surgical critical care (3/10, 30%), general abdomen (1/3, 33%), and pancreas (1/3, 33%) topics. ChatGPT scored higher on biostatistics (4/4 correct, 100%), fluid/electrolytes/acid-base (4/4, 100%), and small intestine (8/9, 89%) questions. ChatGPT answered questions with thorough and structured support for its answers. It scored 56% on ethics questions and provided coherent explanations regarding end-of-life discussions, communication with coworkers and patients, and informed consent. For many questions answered incorrectly, ChatGPT provided cogent, yet factually incorrect descriptions, including anatomy and steps of operations. In two instances, it gave a correct explanation but chose the wrong answer. It did not answer two questions, stating it needed additional information to determine the next best step in treatment. CONCLUSIONS: ChatGPT answered 62% of SCORE questions correctly. It performed better at questions requiring standard recall but struggled with higher-level questions that required complex clinical decision making, despite providing detailed responses behind its rationale. Due to its mediocre performance on this question set and sometimes confidently-worded, yet factually inaccurate responses, caution should be used when interpreting ChatGPT's answers to general surgery questions.

Asunto(s)

Cirugía General , Internado y Residencia , Humanos , Cirugía General/educación , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Estados Unidos , Competencia Clínica/estadística & datos numéricos , Consejos de Especialidades

13.

The Surgical Clerkship in the COVID Era: A Natural Language Processing and Thematic Analysis.

Howell, Thomas Clark; Ladowski, Joseph M; Nash, Amanda; Rhodin, Kristen E; Tracy, Elisabeth T; Migaly, John; Bloom, Diane; Vatsaas, Cory J.

J Surg Res ; 299: 155-162, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38759331

RESUMEN

INTRODUCTION: Responses to COVID-19 within medical education prompted significant changes to the surgical clerkship. We analyzed the changes in medical student end of course feedback before and after the COVID-19 outbreak. METHODS: Postclerkship surveys from 2017 to 2022 were analyzed including both Likert scale data and free text, excluding the COVID outbreak year 2019-2020. Likert scale questions were compared between pre-COVID (2017-2019) and COVID-era cohorts (2020-2022) with the Mann-Whitney U-test. Free-text comments were analyzed using both thematic analysis and natural language processing including sentiment, word and phrase frequency, and topic modeling. RESULTS: Of the 483 medical students surveyed from 2017 to 2022, 297 responded (61% response rate) to the included end of clerkship surveys. Most medical students rated the clerkship above average or excellent with no significant difference between the pre-COVID and COVID-era cohorts (70.4% Versus 64.8%, P = 0.35). Perception of grading expectations did significantly differ, 51% of pre-COVID students reported clerkship grading standards were almost always clear compared to 27.5% of COVID-era students (P = 0.01). Pre-COVID cohorts more frequently mentioned learning and feedback while COVID-era cohorts more frequently mentioned case, attending, and expectation. Natural language processing topic modeling and formal thematic analysis identified similar themes: team, time, autonomy, and expectations. CONCLUSIONS: COVID-19 presented many challenges to undergraduate medical education. Despite many changes, there was no significant difference in clerkship satisfaction ratings. Unexpectedly, the greater freedom and autonomy of asynchronous lectures and choice of cases became a highlight of the new curriculum. Future research should investigate if there are similar associations nationally with a multi-institutional study.

Asunto(s)

COVID-19 , Prácticas Clínicas , Procesamiento de Lenguaje Natural , Estudiantes de Medicina , Humanos , COVID-19/epidemiología , Estudiantes de Medicina/psicología , Estudiantes de Medicina/estadística & datos numéricos , Cirugía General/educación , Encuestas y Cuestionarios , Evaluación Educacional , Femenino , Masculino

14.

Training in robotic-assisted surgery: a systematic review of training modalities and objective and subjective assessment methods.

Rahimi, A Masie; Uluç, Ezgi; Hardon, Sem F; Bonjer, H Jaap; van der Peet, Donald L; Daams, Freek.

Surg Endosc ; 38(7): 3547-3555, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38814347

RESUMEN

INTRODUCTION: The variety of robotic surgery systems, training modalities, and assessment tools within robotic surgery training is extensive. This systematic review aimed to comprehensively overview different training modalities and assessment methods for teaching and assessing surgical skills in robotic surgery, with a specific focus on comparing objective and subjective assessment methods. METHODS: A systematic review was conducted following the PRISMA guidelines. The electronic databases Pubmed, EMBASE, and Cochrane were searched from inception until February 1, 2022. Included studies consisted of robotic-assisted surgery training (e.g., box training, virtual reality training, cadaver training and animal tissue training) with an assessment method (objective or subjective), such as assessment forms, virtual reality scores, peer-to-peer feedback or time recording. RESULTS: The search identified 1591 studies. After abstract screening and full-texts examination, 209 studies were identified that focused on robotic surgery training and included an assessment tool. The majority of the studies utilized the da Vinci Surgical System, with dry lab training being the most common approach, followed by the da Vinci Surgical Skills Simulator. The most frequently used assessment methods included simulator scoring system (e.g., dVSS score), and assessment forms (e.g., GEARS and OSATS). CONCLUSION: This systematic review provides an overview of training modalities and assessment methods in robotic-assisted surgery. Dry lab training on the da Vinci Surgical System and training on the da Vinci Skills Simulator are the predominant approaches. However, focused training on tissue handling, manipulation, and force interaction is lacking, despite the absence of haptic feedback. Future research should focus on developing universal objective assessment and feedback methods to address these limitations as the field continues to evolve.

Asunto(s)

Competencia Clínica , Procedimientos Quirúrgicos Robotizados , Procedimientos Quirúrgicos Robotizados/educación , Humanos , Entrenamiento Simulado/métodos , Evaluación Educacional/métodos , Realidad Virtual , Animales , Cadáver

15.

Development and Pilot Testing of a Programmatic System for Competency Assessment in US Anesthesiology Residency Training.

Woodworth, Glenn E; Goldstein, Zachary T; Ambardekar, Aditee P; Arthur, Mary E; Bailey, Caryl F; Booth, Gregory J; Carney, Patricia A; Chen, Fei; Duncan, Michael J; Fromer, Ilana R; Hallman, Matthew R; Hoang, Thomas; Isaak, Robert; Klesius, Lisa L; Ladlie, Beth L; Mitchell, Sally Ann; Miller Juve, Amy K; Mitchell, John D; McGrath, Brian J; Shepler, John A; Sims, Charles R; Spofford, Christina M; Tanaka, Pedro P; Maniker, Robert B.

Anesth Analg ; 138(5): 1081-1093, 2024 May 01.

Artículo en Inglés | MEDLINE | ID: mdl-37801598

RESUMEN

BACKGROUND: In 2018, a set of entrustable professional activities (EPAs) and procedural skills assessments were developed for anesthesiology training, but they did not assess all the Accreditation Council for Graduate Medical Education (ACGME) milestones. The aims of this study were to (1) remap the 2018 EPA and procedural skills assessments to the revised ACGME Anesthesiology Milestones 2.0, (2) develop new assessments that combined with the original assessments to create a system of assessment that addresses all level 1 to 4 milestones, and (3) provide evidence for the validity of the assessments. METHODS: Using a modified Delphi process, a panel of anesthesiology education experts remapped the original assessments developed in 2018 to the Anesthesiology Milestones 2.0 and developed new assessments to create a system that assessed all level 1 through 4 milestones. Following a 24-month pilot at 7 institutions, the number of EPA and procedural skill assessments and mean scores were computed at the end of the academic year. Milestone achievement and subcompetency data for assessments from a single institution were compared to scores assigned by the institution's clinical competency committee (CCC). RESULTS: New assessment development, 2 months of testing and feedback, and revisions resulted in 5 new EPAs, 11 nontechnical skills assessments (NTSAs), and 6 objective structured clinical examinations (OSCEs). Combined with the original 20 EPAs and procedural skills assessments, the new system of assessment addresses 99% of level 1 to 4 Anesthesiology Milestones 2.0. During the 24-month pilot, aggregate mean EPA and procedural skill scores significantly increased with year in training. System subcompetency scores correlated significantly with 15 of 23 (65.2%) corresponding CCC scores at a single institution, but 8 correlations (36.4%) were <30.0, illustrating poor correlation. CONCLUSIONS: A panel of experts developed a set of EPAs, procedural skill assessment, NTSAs, and OSCEs to form a programmatic system of assessment for anesthesiology residency training in the United States. The method used to develop and pilot test the assessments, the progression of assessment scores with time in training, and the correlation of assessment scores with CCC scoring of milestone achievement provide evidence for the validity of the assessments.

Asunto(s)

Anestesiología , Internado y Residencia , Estados Unidos , Anestesiología/educación , Educación de Postgrado en Medicina , Evaluación Educacional/métodos , Competencia Clínica , Acreditación

16.

Distressing testing: A propensity score analysis of high-stakes exam failure and mental health.

Beck, Kathryn Christine; Røhr, Helene Lie; Reme, Bjørn-Atle; Flatø, Martin.

Child Dev ; 95(1): 242-260, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-37566438

RESUMEN

This study used rich individual-level registry data covering the entire Norwegian population to identify students aged 17-21 who either failed a high-stakes exit exam or who received the lowest passing grade from 2006 to 2018. Propensity score matching on high-quality observed characteristics was utilized to allow meaningful comparisons (N = 18,052, 64% boys). Results showed a 21% increase in odds of receiving a psychological diagnosis among students who failed the exam. Adolescents were at 57% reduced odds of graduating and 44% reduction in odds of enrolling in tertiary education 5 years following the exam. Results suggest that failing a high-stakes exam is associated with mental health issues and therefore may impact adolescents more broadly than captured in educational outcomes.

Asunto(s)

Evaluación Educacional , Salud Mental , Masculino , Adolescente , Humanos , Femenino , Evaluación Educacional/métodos , Puntaje de Propensión , Estudiantes , Escolaridad

17.

The outcomes of team-based learning vs small group interactive learning in the obstetrics and gynecology course for undergraduate students.

Sterpu, Irene; Herling, Lotta; Nordquist, Jonas; Möller, Anna; Kopp Kallner, Helena; Engberg, Hedvig; Acharya, Ganesh.

Acta Obstet Gynecol Scand ; 103(6): 1224-1230, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38366801

RESUMEN

INTRODUCTION: Team-based learning (TBL) is a well-established active teaching method which has been shown to have pedagogical advantages in some areas such as business education and preclinical disciplines in undergraduate medical education. Increasingly, it has been adapted to clinical disciplines. However, its superiority over conventional learning methods used in clinical years of medical school remains unclear. The aim of this study was to compare TBL with traditional seminars delivered in small group interactive learning (SIL) format in terms of knowledge acquisition and retention, satisfaction and engagement of undergraduate medical students during the 6-week obstetrics and gynecology clerkship. MATERIAL AND METHODS: The study was conducted at Karolinska Institutet, a medical university in Sweden, and had a prospective, crossover design. All fifth-year medical students attending the obstetrics and gynecology clerkship, at four different teaching hospitals in Stockholm (approximately 40 students per site), in the Autumn semester of 2022 were invited to participate. Two seminars (one in obstetrics and one in gynecology) were designed and delivered in two different formats, ie TBL and SIL. The student:teacher ratio was approximately 10:1 in the traditional SIL seminars and 20:1 in the TBL. All TBL seminars were facilitated by a single teacher who had been trained and certified in TBL. Student knowledge acquisition and retention were assessed by final examination scores, and the engagement and satisfaction were assessed by questionnaires. For the TBL seminars, individual and team readiness assurance tests were also performed and evaluated. RESULTS: Of 148 students participating in the classrooms, 132 answered the questionnaires. No statistically significant differences were observed between TBL and SIL methods with regard to student knowledge acquisition and retention, engagement and satisfaction. CONCLUSIONS: We found no differences in student learning outcomes or satisfaction using TBL or SIL methods. However, as TBL had a double the student to teacher ratio as compared with SIL, in settings where teachers are scarce and suitable rooms are available for TBL sessions, the method may be beneficial in reducing faculty workload without compromising students' learning outcomes.

Asunto(s)

Educación de Pregrado en Medicina , Ginecología , Obstetricia , Ginecología/educación , Humanos , Obstetricia/educación , Educación de Pregrado en Medicina/métodos , Estudios Prospectivos , Femenino , Suecia , Estudios Cruzados , Estudiantes de Medicina/psicología , Aprendizaje Basado en Problemas/métodos , Masculino , Evaluación Educacional , Prácticas Clínicas/métodos , Procesos de Grupo , Adulto , Encuestas y Cuestionarios

18.

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.

Noda, Ryunosuke; Izaki, Yuto; Kitano, Fumiya; Komatsu, Jun; Ichikawa, Daisuke; Shibagaki, Yugo.

Clin Exp Nephrol ; 28(5): 465-469, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38353783

RESUMEN

BACKGROUND: Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. METHODS: Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. RESULTS: The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. CONCLUSIONS: GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.

Asunto(s)

Nefrología , Autoevaluación (Psicología) , Humanos , Evaluación Educacional , Consejos de Especialidades , Competencia Clínica , Inteligencia Artificial

19.

Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists.

Valdez, Dustin; Bunnell, Arianna; Lim, Sian Y; Sadowski, Peter; Shepherd, John A.

J Clin Densitom ; 27(2): 101480, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38401238

RESUMEN

BACKGROUND: Artificial intelligence (AI) large language models (LLMs) such as ChatGPT have demonstrated the ability to pass standardized exams. These models are not trained for a specific task, but instead trained to predict sequences of text from large corpora of documents sourced from the internet. It has been shown that even models trained on this general task can pass exams in a variety of domain-specific fields, including the United States Medical Licensing Examination. We asked if large language models would perform as well on a much narrower subdomain tests designed for medical specialists. Furthermore, we wanted to better understand how progressive generations of GPT (generative pre-trained transformer) models may be evolving in the completeness and sophistication of their responses even while generational training remains general. In this study, we evaluated the performance of two versions of GPT (GPT 3 and 4) on their ability to pass the certification exam given to physicians to work as osteoporosis specialists and become a certified clinical densitometrists. The CCD exam has a possible score range of 150 to 400. To pass, you need a score of 300. METHODS: A 100-question multiple-choice practice exam was obtained from a 3rd party exam preparation website that mimics the accredited certification tests given by the ISCD (International Society for Clinical Densitometry). The exam was administered to two versions of GPT, the free version (GPT Playground) and ChatGPT+, which are based on GPT-3 and GPT-4, respectively (OpenAI, San Francisco, CA). The systems were prompted with the exam questions verbatim. If the response was purely textual and did not specify which of the multiple-choice answers to select, the authors matched the text to the closest answer. Each exam was graded and an estimated ISCD score was provided from the exam website. In addition, each response was evaluated by a rheumatologist CCD and ranked for accuracy using a 5-level scale. The two GPT versions were compared in terms of response accuracy and length. RESULTS: The average response length was 11.6 ±19 words for GPT-3 and 50.0±43.6 words for GPT-4. GPT-3 answered 62 questions correctly resulting in a failing ISCD score of 289. However, GPT-4 answered 82 questions correctly with a passing score of 342. GPT-3 scored highest on the "Overview of Low Bone Mass and Osteoporosis" category (72â¯% correct) while GPT-4 scored well above 80â¯% accuracy on all categories except "Imaging Technology in Bone Health" (65â¯% correct). Regarding subjective accuracy, GPT-3 answered 23 questions with nonsensical or totally wrong responses while GPT-4 had no responses in that category. CONCLUSION: If this had been an actual certification exam, GPT-4 would now have a CCD suffix to its name even after being trained using general internet knowledge. Clearly, more goes into physician training than can be captured in this exam. However, GPT algorithms may prove to be valuable physician aids in the diagnoses and monitoring of osteoporosis and other diseases.

Asunto(s)

Inteligencia Artificial , Certificación , Humanos , Osteoporosis/diagnóstico , Competencia Clínica , Evaluación Educacional/métodos , Estados Unidos

20.

Using MCQ response certainty to determine how aspects of self-monitoring develop through a medical course.

Tweed, Mike; Willink, Robin; Wilkinson, Tim J.

Med Educ ; 58(5): 535-543, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-37932950

RESUMEN

INTRODUCTION: Self-monitoring of clinical-decision-making is essential for health care professional practice. Using certainty in responses to assessment items could allow self-monitoring of clinical-decision-making by medical students to be tracked over time. This research introduces how aspects of insightfulness, safety and efficiency could be based on certainty in, and correctness of, multiple-choice question (MCQ) responses. We also show how these measures change over time. METHODS: With each answer on twice yearly MCQ progress tests, medical students provided their certainty of correctness. An insightful student would be more likely to be correct for those answers given with increasing certainty. A safe student would be expected to have a high probability of being correct for answers given with a high certainty. An efficient student would be expected to have a sufficiently low probability of being correct when they have no certainty. The system was developed using first principles and data from one cohort of students. A dataset from a second cohort was then used as an independent validation sample. RESULTS: The patterns of aspects of self-monitoring were similar for both cohorts. Almost all the students met the criteria for insightfulness on all tests. Most students had an undetermined outcome for the safety aspect. When a definitive result for safety was obtained, absence of safety was most prevalent in the middle of the course, while the presence of safety increased later. Most of the students met the criteria for efficiency, with the highest prevalence mid-course, but efficiency was more likely to be absent later. DISCUSSION: Throughout the course, students showed reassuring levels of insightfulness. The results suggest that students may balance safety with efficiency. This may be explained by students learning the positive implications of decisions before the negative implications, making them initially more efficient, but later being more cautious and safer.

Asunto(s)

Evaluación Educacional , Estudiantes de Medicina , Humanos , Evaluación Educacional/métodos , Aprendizaje , Competencia Clínica , Toma de Decisiones Clínicas

RESUMEN

Asunto(s)

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA