Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.079
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38977032

RESUMEN

PURPOSE: This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. METHODS: In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024. RESULTS: GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P>0.0001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items. Conclusion: s: ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology's Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Urología , Humanos , Estados Unidos , Evaluación Educacional/métodos , Urología/educación , Competencia Clínica/normas , Consejos de Especialidades
2.
Artículo en Inglés | MEDLINE | ID: mdl-38996226

RESUMEN

INTRODUCTION: This study aimed to evaluate the influence of training background on the frequency and indications of elbow arthroplasty performed by early-career surgeons. METHODS: A review of the American Board of Orthopaedic Surgery Part II Oral Examination Case List database from 2010 to 2021 was completed. The number of cases performed by surgeons from each individual training background were calculated and compared with the total number of surgeons who completed each fellowship during the study period. RESULTS: Hand surgeons performed the most elbow arthroplasty cases (132, 44%), but a higher percentage of shoulder/elbow surgeons performed elbow arthroplasty in comparison (15% vs. 7%). The mean number of TEA cases performed by shoulder/elbow surgeons was significantly higher than in other subspecialties (P < 0.01). However, when comparing only surgeons who performed elbow arthroplasty during the board collection period, there was no significant difference between training backgrounds (P = 0.20). DISCUSSION: While hand surgeons performed the most elbow arthroplasty cases, a higher percentage of shoulder/elbow surgeons performed elbow arthroplasty during the study period. The high prevalence of distal humerus fracture as an indication for arthroplasty reflected a shift in indications and was not related to training background.


Asunto(s)
Artroplastia de Reemplazo de Codo , Bases de Datos Factuales , Ortopedia , Humanos , Estados Unidos , Ortopedia/educación , Cirujanos Ortopédicos/educación , Consejos de Especialidades , Articulación del Codo/cirugía
3.
BMC Med Educ ; 24(1): 749, 2024 Jul 11.
Artículo en Inglés | MEDLINE | ID: mdl-38992662

RESUMEN

In response to the COVID-19 pandemic, the American Board of Anesthesiology transitioned from in-person to virtual administration of its APPLIED Examination, assessing more than 3000 candidates for certification purposes remotely in 2021. Four hundred examiners were involved in delivering and scoring Standardized Oral Examinations (SOEs) and Objective Structured Clinical Examinations (OSCEs). More than 80% of candidates started their exams on time and stayed connected throughout the exam without any problems. Only 74 (2.5%) SOE and 45 (1.5%) OSCE candidates required rescheduling due to technical difficulties. Of those who experienced "significant issues", concerns with OSCE technical stations (interpretation of monitors and interpretation of echocardiograms) were reported most frequently (6% of candidates). In contrast, 23% of examiners "sometimes" lost connectivity during their multiple exam sessions, on a continuum from minor inconvenience to inability to continue. 84% of SOE candidates and 89% of OSCE candidates described "smooth" interactions with examiners and standardized patients/standardized clinicians, respectively. However, only 71% of SOE candidates and 75% of OSCE candidates considered themselves to be able to demonstrate their knowledge and skills without obstacles. When compared with their in-person experiences, approximately 40% of SOE examiners considered virtual evaluation to be more difficult than in-person evaluation and believed the remote format negatively affected their development as an examiner. The virtual format was considered to be less secure by 56% and 40% of SOE and OSCE examiners, respectively. The retirement of exam materials used virtually due to concern for compromise had implications for subsequent exam development. The return to in-person exams in 2022 was prompted by multiple factors, especially concerns regarding standardization and security. The technology is not yet perfect, especially for testing in-person communication skills and displaying dynamic exam materials. Nevertheless, the American Board of Anesthesiology's experience demonstrated the feasibility of conducting large-scale, high-stakes oral and performance exams in a virtual format and highlighted the adaptability and dedication of candidates, examiners, and administering board staff.


Asunto(s)
Anestesiología , COVID-19 , Evaluación Educacional , Consejos de Especialidades , Humanos , Anestesiología/educación , Estados Unidos , Evaluación Educacional/métodos , Competencia Clínica/normas , Certificación/normas , SARS-CoV-2 , Pandemias
4.
J Surg Res ; 300: 191-197, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38824849

RESUMEN

INTRODUCTION: There is no consensus regarding optimal curricula to teach cognitive elements of general surgery. The American Board of Surgery In-Training Exam (ABSITE) aims to measure trainees' progress in attaining this knowledge. Resources like question banks (QBs), Surgical Council on Resident Education (SCORE) curriculum, and didactic conferences have mixed findings related to ABSITE performance and are often evaluated in isolation. This study characterized relationships between multiple learning methods and ABSITE performance to elucidate the relative educational value of learning strategies. METHODS: Use and score of QB, SCORE use, didactic conference attendance, and ABSITE percentile score were collected at an academic general surgery residency program from 2017 to 2022. QB data were available in the years 2017-2018 and 2021-2022 during institutional subscription to the same platform. Given differences in risk of qualifying exam failure, groups of ≤30th and >30th percentile were analyzed. Linear quantile mixed regressions and generalized linear mixed models determined factors associated with ABSITE performance. RESULTS: Linear quantile mixed regressions revealed a relationship between ABSITE performance and QB questions completed (1.5 percentile per 100 questions, P < 0.001) and QB score (1.2 percentile per 1% score, P < 0.001), but not with SCORE use and didactic attendance. Performers >30th percentile had a significantly higher QB score. CONCLUSIONS: Use and score of QB had a significant relationship with ABSITE performance, while SCORE use and didactic attendance did not. Performers >30th percentile completed a median 1094 QB questions annually with a score of 65%. Results emphasize success of QB use as an active learning strategy, while passive learning methods warrant further evaluation.


Asunto(s)
Evaluación Educacional , Cirugía General , Internado y Residencia , Humanos , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Cirugía General/educación , Internado y Residencia/métodos , Estados Unidos , Competencia Clínica/estadística & datos numéricos , Curriculum , Consejos de Especialidades , Aprendizaje , Educación de Postgrado en Medicina/métodos
6.
J Am Board Fam Med ; 37(2): 279-289, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38740475

RESUMEN

BACKGROUND: The potential for machine learning (ML) to enhance the efficiency of medical specialty boards has not been explored. We applied unsupervised ML to identify archetypes among American Board of Family Medicine (ABFM) Diplomates regarding their practice characteristics and motivations for participating in continuing certification, then examined associations between motivation patterns and key recertification outcomes. METHODS: Diplomates responding to the 2017 to 2021 ABFM Family Medicine continuing certification examination surveys selected motivations for choosing to continue certification. We used Chi-squared tests to examine difference proportions of Diplomates failing their first recertification examination attempt who endorsed different motivations for maintaining certification. Unsupervised ML techniques were applied to generate clusters of physicians with similar practice characteristics and motivations for recertifying. Controlling for physician demographic variables, we used logistic regression to examine the effect of motivation clusters on recertification examination success and validated the ML clusters by comparison with a previously created classification schema developed by experts. RESULTS: ML clusters largely recapitulated the intrinsic/extrinsic framework devised by experts previously. However, the identified clusters achieved a more equal partitioning of Diplomates into homogenous groups. In both ML and human clusters, physicians with mainly extrinsic or mixed motivations had lower rates of examination failure than those who were intrinsically motivated. DISCUSSION: This study demonstrates the feasibility of using ML to supplement and enhance human interpretation of board certification data. We discuss implications of this demonstration study for the interaction between specialty boards and physician Diplomates.


Asunto(s)
Certificación , Medicina Familiar y Comunitaria , Aprendizaje Automático , Motivación , Consejos de Especialidades , Humanos , Medicina Familiar y Comunitaria/educación , Masculino , Femenino , Estados Unidos , Adulto , Educación Médica Continua , Persona de Mediana Edad , Encuestas y Cuestionarios , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Competencia Clínica
8.
JAMA Netw Open ; 7(5): e2410127, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38713464

RESUMEN

Importance: Board certification can have broad implications for candidates' career trajectories, and prior research has found sociodemographic disparities in pass rates. Barriers in the format and administration of the oral board examinations may disproportionately affect certain candidates. Objective: To characterize oral certifying examination policies and practices of the 16 Accreditation Council for Graduate Medical Education (ACGME)-accredited specialties that require oral examinations. Design, Setting, and Participants: This cross-sectional study was conducted from March 1 to April 15, 2023, using data on oral examination practices and policies (examination format, dates, and setting; lactation accommodations; and accommodations for military deployment, family emergency, or medical leave) as well as the gender composition of the specialties' boards of directors obtained from websites, telephone calls and email correspondence with certifying specialists. The percentages of female residents and residents of racial and ethnic backgrounds who are historically underrepresented in medicine (URM) in each specialty as of December 31, 2021, were obtained from the Graduate Medical Education 2021 to 2022 report. Main Outcome and Measures: For each specialty, accommodation scores were measured by a modified objective scoring system (score range: 1-13, with higher scores indicating more accommodations). Poisson regression was used to assess the association between accommodation score and the diversity of residents in that specialty, as measured by the percentages of female and URM residents. Linear regression was used to assess whether gender diversity of a specialty's board of directors was associated with accommodation scores. Results: Included in the analysis were 16 specialties with a total of 46 027 residents (26 533 males [57.6%]) and 233 members of boards of directors (152 males [65.2%]). The mean (SD) total accommodation score was 8.28 (3.79), and the median (IQR) score was 9.25 (5.00-12.00). No association was found between test accommodation score and the percentage of female or URM residents. However, for each 1-point increase in the test accommodation score, the relative risk that a resident was female was 1.05 (95% CI, 0.96-1.16), and the relative risk that an individual was a URM resident was 1.04 (95% CI, 1.00-1.07). An association was found between the percentage of female board members and the accommodation score: for each 10% increase in the percentage of board members who were female, the accommodation score increased by 1.20 points (95% CI, 0.23-2.16 points; P = .03). Conclusions and Relevance: This cross-sectional study found considerable variability in oral board examination accommodations among ACGME-accredited specialties, highlighting opportunities for improvement and standardization. Promoting diversity in leadership bodies may lead to greater accommodations for examinees in extenuating circumstances.


Asunto(s)
Certificación , Humanos , Estudios Transversales , Femenino , Masculino , Certificación/estadística & datos numéricos , Estados Unidos , Consejos de Especialidades/estadística & datos numéricos , Evaluación Educacional/estadística & datos numéricos , Evaluación Educacional/métodos , Educación de Postgrado en Medicina/estadística & datos numéricos , Medicina/estadística & datos numéricos , Adulto
14.
Radiology ; 311(2): e232715, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38771184

RESUMEN

Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to benchmark ChatGPT, were administered to default versions of ChatGPT (GPT-3.5 and GPT-4) on three separate attempts (separated by ≥1 month and then 1 week). Accuracy and answer choices between attempts were compared to assess reliability (accuracy over time) and repeatability (agreement over time). On the third attempt, regardless of answer choice, ChatGPT was challenged three times with the adversarial prompt, "Your answer choice is incorrect. Please choose a different option," to assess robustness (ability to withstand adversarial prompting). ChatGPT was prompted to rate its confidence from 1-10 (with 10 being the highest level of confidence and 1 being the lowest) on the third attempt and after each challenge prompt. Results Neither version showed a difference in accuracy over three attempts: for the first, second, and third attempt, accuracy of GPT-3.5 was 69.3% (104 of 150), 63.3% (95 of 150), and 60.7% (91 of 150), respectively (P = .06); and accuracy of GPT-4 was 80.6% (121 of 150), 78.0% (117 of 150), and 76.7% (115 of 150), respectively (P = .42). Though both GPT-4 and GPT-3.5 had only moderate intrarater agreement (κ = 0.78 and 0.64, respectively), the answer choices of GPT-4 were more consistent across three attempts than those of GPT-3.5 (agreement, 76.7% [115 of 150] vs 61.3% [92 of 150], respectively; P = .006). After challenge prompt, both changed responses for most questions, though GPT-4 did so more frequently than GPT-3.5 (97.3% [146 of 150] vs 71.3% [107 of 150], respectively; P < .001). Both rated "high confidence" (≥8 on the 1-10 scale) for most initial responses (GPT-3.5, 100% [150 of 150]; and GPT-4, 94.0% [141 of 150]) as well as for incorrect responses (ie, overconfidence; GPT-3.5, 100% [59 of 59]; and GPT-4, 77% [27 of 35], respectively; P = .89). Conclusion Default GPT-3.5 and GPT-4 were reliably accurate across three attempts, but both had poor repeatability and robustness and were frequently overconfident. GPT-4 was more consistent across attempts than GPT-3.5 but more influenced by an adversarial prompt. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ballard in this issue.


Asunto(s)
Competencia Clínica , Evaluación Educacional , Radiología , Humanos , Estudios Prospectivos , Reproducibilidad de los Resultados , Evaluación Educacional/métodos , Consejos de Especialidades
15.
Br J Oral Maxillofac Surg ; 62(5): 477-482, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38692979

RESUMEN

When the Postgraduate Medical Education and Training Board's (PMETB) Review of Oral and Maxillofacial Surgery (OMFS) Training was published in 2008 it contained five recommendations about OMFS training. As yet, none of these recommendations has been delivered. An online survey was designed to assess awareness of the PMETB review and the current views of OMFS trainees and consultants about its recommendations. Replies were invited using email and social media (WhatsApp, Twitter, and Facebook). As a result of using social media no denominator for the response rate was possible. A total of 304 responses were received, eight of which were anonymous. There was strong support for all the OMFS-specific recommendations: 1: the OMFS specialty should remain a dual medical and dental degree specialty (255, 84%); 2: OMFS training should be shortened (283, 93%); 3: OMFS training should start at the beginning of the second degree (203, 67%); 4: there should be a single medical regulator (General Medical Council) for OMFS (258, 85%); and 6: the need for a second Foundation Year should be removed (260, 86%). Other suggestions about improving OMFS training were also made by participants in the survey. There remains strong support within the specialty for the recommendations of the review. This support is present across consultants, specialty trainees, and those aiming for OMFS specialty training. Some of the original legislative obstructions to delivery of the recommendations have been removed by Brexit creating a unique opportunity for them to be delivered.


Asunto(s)
Cirugía Bucal , Humanos , Reino Unido , Cirugía Bucal/educación , Actitud del Personal de Salud , Consultores , Educación de Postgrado en Medicina , Encuestas y Cuestionarios , Consejos de Especialidades
16.
Ophthalmologie ; 121(7): 554-564, 2024 Jul.
Artículo en Alemán | MEDLINE | ID: mdl-38801461

RESUMEN

PURPOSE: In recent years artificial intelligence (AI), as a new segment of computer science, has also become increasingly more important in medicine. The aim of this project was to investigate whether the current version of ChatGPT (ChatGPT 4.0) is able to answer open questions that could be asked in the context of a German board examination in ophthalmology. METHODS: After excluding image-based questions, 10 questions from 15 different chapters/topics were selected from the textbook 1000 questions in ophthalmology (1000 Fragen Augenheilkunde 2nd edition, 2014). ChatGPT was instructed by means of a so-called prompt to assume the role of a board certified ophthalmologist and to concentrate on the essentials when answering. A human expert with considerable expertise in the respective topic, evaluated the answers regarding their correctness, relevance and internal coherence. Additionally, the overall performance was rated by school grades and assessed whether the answers would have been sufficient to pass the ophthalmology board examination. RESULTS: The ChatGPT would have passed the board examination in 12 out of 15 topics. The overall performance, however, was limited with only 53.3% completely correct answers. While the correctness of the results in the different topics was highly variable (uveitis and lens/cataract 100%; optics and refraction 20%), the answers always had a high thematic fit (70%) and internal coherence (71%). CONCLUSION: The fact that ChatGPT 4.0 would have passed the specialist examination in 12 out of 15 topics is remarkable considering the fact that this AI was not specifically trained for medical questions; however, there is a considerable performance variability between the topics, with some serious shortcomings that currently rule out its safe use in clinical practice.


Asunto(s)
Evaluación Educacional , Oftalmología , Consejos de Especialidades , Oftalmología/educación , Evaluación Educacional/métodos , Evaluación Educacional/normas , Alemania , Humanos , Competencia Clínica/normas , Certificación , Inteligencia Artificial
17.
J Surg Res ; 299: 329-335, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38788470

RESUMEN

INTRODUCTION: Chat Generative Pretrained Transformer (ChatGPT) is a large language model capable of generating human-like text. This study sought to evaluate ChatGPT's performance on Surgical Council on Resident Education (SCORE) self-assessment questions. METHODS: General surgery multiple choice questions were randomly selected from the SCORE question bank. ChatGPT (GPT-3.5, April-May 2023) evaluated questions and responses were recorded. RESULTS: ChatGPT correctly answered 123 of 200 questions (62%). ChatGPT scored lowest on biliary (2/8 questions correct, 25%), surgical critical care (3/10, 30%), general abdomen (1/3, 33%), and pancreas (1/3, 33%) topics. ChatGPT scored higher on biostatistics (4/4 correct, 100%), fluid/electrolytes/acid-base (4/4, 100%), and small intestine (8/9, 89%) questions. ChatGPT answered questions with thorough and structured support for its answers. It scored 56% on ethics questions and provided coherent explanations regarding end-of-life discussions, communication with coworkers and patients, and informed consent. For many questions answered incorrectly, ChatGPT provided cogent, yet factually incorrect descriptions, including anatomy and steps of operations. In two instances, it gave a correct explanation but chose the wrong answer. It did not answer two questions, stating it needed additional information to determine the next best step in treatment. CONCLUSIONS: ChatGPT answered 62% of SCORE questions correctly. It performed better at questions requiring standard recall but struggled with higher-level questions that required complex clinical decision making, despite providing detailed responses behind its rationale. Due to its mediocre performance on this question set and sometimes confidently-worded, yet factually inaccurate responses, caution should be used when interpreting ChatGPT's answers to general surgery questions.


Asunto(s)
Cirugía General , Internado y Residencia , Humanos , Cirugía General/educación , Evaluación Educacional/métodos , Evaluación Educacional/estadística & datos numéricos , Estados Unidos , Competencia Clínica/estadística & datos numéricos , Consejos de Especialidades
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...