Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.204
Filtrar
1.
BMC Med Educ ; 24(1): 749, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38992662

RESUMO

In response to the COVID-19 pandemic, the American Board of Anesthesiology transitioned from in-person to virtual administration of its APPLIED Examination, assessing more than 3000 candidates for certification purposes remotely in 2021. Four hundred examiners were involved in delivering and scoring Standardized Oral Examinations (SOEs) and Objective Structured Clinical Examinations (OSCEs). More than 80% of candidates started their exams on time and stayed connected throughout the exam without any problems. Only 74 (2.5%) SOE and 45 (1.5%) OSCE candidates required rescheduling due to technical difficulties. Of those who experienced "significant issues", concerns with OSCE technical stations (interpretation of monitors and interpretation of echocardiograms) were reported most frequently (6% of candidates). In contrast, 23% of examiners "sometimes" lost connectivity during their multiple exam sessions, on a continuum from minor inconvenience to inability to continue. 84% of SOE candidates and 89% of OSCE candidates described "smooth" interactions with examiners and standardized patients/standardized clinicians, respectively. However, only 71% of SOE candidates and 75% of OSCE candidates considered themselves to be able to demonstrate their knowledge and skills without obstacles. When compared with their in-person experiences, approximately 40% of SOE examiners considered virtual evaluation to be more difficult than in-person evaluation and believed the remote format negatively affected their development as an examiner. The virtual format was considered to be less secure by 56% and 40% of SOE and OSCE examiners, respectively. The retirement of exam materials used virtually due to concern for compromise had implications for subsequent exam development. The return to in-person exams in 2022 was prompted by multiple factors, especially concerns regarding standardization and security. The technology is not yet perfect, especially for testing in-person communication skills and displaying dynamic exam materials. Nevertheless, the American Board of Anesthesiology's experience demonstrated the feasibility of conducting large-scale, high-stakes oral and performance exams in a virtual format and highlighted the adaptability and dedication of candidates, examiners, and administering board staff.


Assuntos
Anestesiologia , COVID-19 , Avaliação Educacional , Conselhos de Especialidade Profissional , Humanos , Anestesiologia/educação , Estados Unidos , Avaliação Educacional/métodos , Competência Clínica/normas , Certificação/normas , SARS-CoV-2 , Pandemias
2.
Artigo em Inglês | MEDLINE | ID: mdl-38977032

RESUMO

PURPOSE: This study aimed to evaluate the performance of Chat Generative Pre-Trained Transformer (ChatGPT) with respect to standardized urology multiple-choice items in the United States. METHODS: In total, 700 multiple-choice urology board exam-style items were submitted to GPT-3.5 and GPT-4, and responses were recorded. Items were categorized based on topic and question complexity (recall, interpretation, and problem-solving). The accuracy of GPT-3.5 and GPT-4 was compared across item types in February 2024. RESULTS: GPT-4 answered 44.4% of items correctly compared to 30.9% for GPT-3.5 (P>0.0001). GPT-4 (vs. GPT-3.5) had higher accuracy with urologic oncology (43.8% vs. 33.9%, P=0.03), sexual medicine (44.3% vs. 27.8%, P=0.046), and pediatric urology (47.1% vs. 27.1%, P=0.012) items. Endourology (38.0% vs. 25.7%, P=0.15), reconstruction and trauma (29.0% vs. 21.0%, P=0.41), and neurourology (49.0% vs. 33.3%, P=0.11) items did not show significant differences in performance across versions. GPT-4 also outperformed GPT-3.5 with respect to recall (45.9% vs. 27.4%, P<0.00001), interpretation (45.6% vs. 31.5%, P=0.0005), and problem-solving (41.8% vs. 34.5%, P=0.56) type items. This difference was not significant for the higher-complexity items. Conclusion: s: ChatGPT performs relatively poorly on standardized multiple-choice urology board exam-style items, with GPT-4 outperforming GPT-3.5. The accuracy was below the proposed minimum passing standards for the American Board of Urology's Continuing Urologic Certification knowledge reinforcement activity (60%). As artificial intelligence progresses in complexity, ChatGPT may become more capable and accurate with respect to board examination items. For now, its responses should be scrutinized.


Assuntos
Competência Clínica , Avaliação Educacional , Urologia , Humanos , Estados Unidos , Avaliação Educacional/métodos , Urologia/educação , Competência Clínica/normas , Conselhos de Especialidade Profissional
3.
J Surg Res ; 299: 329-335, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38788470

RESUMO

INTRODUCTION: Chat Generative Pretrained Transformer (ChatGPT) is a large language model capable of generating human-like text. This study sought to evaluate ChatGPT's performance on Surgical Council on Resident Education (SCORE) self-assessment questions. METHODS: General surgery multiple choice questions were randomly selected from the SCORE question bank. ChatGPT (GPT-3.5, April-May 2023) evaluated questions and responses were recorded. RESULTS: ChatGPT correctly answered 123 of 200 questions (62%). ChatGPT scored lowest on biliary (2/8 questions correct, 25%), surgical critical care (3/10, 30%), general abdomen (1/3, 33%), and pancreas (1/3, 33%) topics. ChatGPT scored higher on biostatistics (4/4 correct, 100%), fluid/electrolytes/acid-base (4/4, 100%), and small intestine (8/9, 89%) questions. ChatGPT answered questions with thorough and structured support for its answers. It scored 56% on ethics questions and provided coherent explanations regarding end-of-life discussions, communication with coworkers and patients, and informed consent. For many questions answered incorrectly, ChatGPT provided cogent, yet factually incorrect descriptions, including anatomy and steps of operations. In two instances, it gave a correct explanation but chose the wrong answer. It did not answer two questions, stating it needed additional information to determine the next best step in treatment. CONCLUSIONS: ChatGPT answered 62% of SCORE questions correctly. It performed better at questions requiring standard recall but struggled with higher-level questions that required complex clinical decision making, despite providing detailed responses behind its rationale. Due to its mediocre performance on this question set and sometimes confidently-worded, yet factually inaccurate responses, caution should be used when interpreting ChatGPT's answers to general surgery questions.


Assuntos
Cirurgia Geral , Internato e Residência , Humanos , Cirurgia Geral/educação , Avaliação Educacional/métodos , Avaliação Educacional/estatística & dados numéricos , Estados Unidos , Competência Clínica/estatística & dados numéricos , Conselhos de Especialidade Profissional
5.
Br J Oral Maxillofac Surg ; 62(5): 477-482, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38692979

RESUMO

When the Postgraduate Medical Education and Training Board's (PMETB) Review of Oral and Maxillofacial Surgery (OMFS) Training was published in 2008 it contained five recommendations about OMFS training. As yet, none of these recommendations has been delivered. An online survey was designed to assess awareness of the PMETB review and the current views of OMFS trainees and consultants about its recommendations. Replies were invited using email and social media (WhatsApp, Twitter, and Facebook). As a result of using social media no denominator for the response rate was possible. A total of 304 responses were received, eight of which were anonymous. There was strong support for all the OMFS-specific recommendations: 1: the OMFS specialty should remain a dual medical and dental degree specialty (255, 84%); 2: OMFS training should be shortened (283, 93%); 3: OMFS training should start at the beginning of the second degree (203, 67%); 4: there should be a single medical regulator (General Medical Council) for OMFS (258, 85%); and 6: the need for a second Foundation Year should be removed (260, 86%). Other suggestions about improving OMFS training were also made by participants in the survey. There remains strong support within the specialty for the recommendations of the review. This support is present across consultants, specialty trainees, and those aiming for OMFS specialty training. Some of the original legislative obstructions to delivery of the recommendations have been removed by Brexit creating a unique opportunity for them to be delivered.


Assuntos
Cirurgia Bucal , Humanos , Reino Unido , Cirurgia Bucal/educação , Atitude do Pessoal de Saúde , Consultores , Educação de Pós-Graduação em Medicina , Inquéritos e Questionários , Conselhos de Especialidade Profissional
6.
Ophthalmologie ; 121(7): 554-564, 2024 Jul.
Artigo em Alemão | MEDLINE | ID: mdl-38801461

RESUMO

PURPOSE: In recent years artificial intelligence (AI), as a new segment of computer science, has also become increasingly more important in medicine. The aim of this project was to investigate whether the current version of ChatGPT (ChatGPT 4.0) is able to answer open questions that could be asked in the context of a German board examination in ophthalmology. METHODS: After excluding image-based questions, 10 questions from 15 different chapters/topics were selected from the textbook 1000 questions in ophthalmology (1000 Fragen Augenheilkunde 2nd edition, 2014). ChatGPT was instructed by means of a so-called prompt to assume the role of a board certified ophthalmologist and to concentrate on the essentials when answering. A human expert with considerable expertise in the respective topic, evaluated the answers regarding their correctness, relevance and internal coherence. Additionally, the overall performance was rated by school grades and assessed whether the answers would have been sufficient to pass the ophthalmology board examination. RESULTS: The ChatGPT would have passed the board examination in 12 out of 15 topics. The overall performance, however, was limited with only 53.3% completely correct answers. While the correctness of the results in the different topics was highly variable (uveitis and lens/cataract 100%; optics and refraction 20%), the answers always had a high thematic fit (70%) and internal coherence (71%). CONCLUSION: The fact that ChatGPT 4.0 would have passed the specialist examination in 12 out of 15 topics is remarkable considering the fact that this AI was not specifically trained for medical questions; however, there is a considerable performance variability between the topics, with some serious shortcomings that currently rule out its safe use in clinical practice.


Assuntos
Avaliação Educacional , Oftalmologia , Conselhos de Especialidade Profissional , Oftalmologia/educação , Avaliação Educacional/métodos , Avaliação Educacional/normas , Alemanha , Humanos , Competência Clínica/normas , Certificação , Inteligência Artificial
8.
J Am Board Fam Med ; 37(2): 279-289, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38740475

RESUMO

BACKGROUND: The potential for machine learning (ML) to enhance the efficiency of medical specialty boards has not been explored. We applied unsupervised ML to identify archetypes among American Board of Family Medicine (ABFM) Diplomates regarding their practice characteristics and motivations for participating in continuing certification, then examined associations between motivation patterns and key recertification outcomes. METHODS: Diplomates responding to the 2017 to 2021 ABFM Family Medicine continuing certification examination surveys selected motivations for choosing to continue certification. We used Chi-squared tests to examine difference proportions of Diplomates failing their first recertification examination attempt who endorsed different motivations for maintaining certification. Unsupervised ML techniques were applied to generate clusters of physicians with similar practice characteristics and motivations for recertifying. Controlling for physician demographic variables, we used logistic regression to examine the effect of motivation clusters on recertification examination success and validated the ML clusters by comparison with a previously created classification schema developed by experts. RESULTS: ML clusters largely recapitulated the intrinsic/extrinsic framework devised by experts previously. However, the identified clusters achieved a more equal partitioning of Diplomates into homogenous groups. In both ML and human clusters, physicians with mainly extrinsic or mixed motivations had lower rates of examination failure than those who were intrinsically motivated. DISCUSSION: This study demonstrates the feasibility of using ML to supplement and enhance human interpretation of board certification data. We discuss implications of this demonstration study for the interaction between specialty boards and physician Diplomates.


Assuntos
Certificação , Medicina de Família e Comunidade , Aprendizado de Máquina , Motivação , Conselhos de Especialidade Profissional , Humanos , Medicina de Família e Comunidade/educação , Masculino , Feminino , Estados Unidos , Adulto , Educação Médica Continuada , Pessoa de Meia-Idade , Inquéritos e Questionários , Avaliação Educacional/métodos , Avaliação Educacional/estatística & dados numéricos , Competência Clínica
11.
Radiology ; 311(2): e232715, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38771184

RESUMO

Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to benchmark ChatGPT, were administered to default versions of ChatGPT (GPT-3.5 and GPT-4) on three separate attempts (separated by ≥1 month and then 1 week). Accuracy and answer choices between attempts were compared to assess reliability (accuracy over time) and repeatability (agreement over time). On the third attempt, regardless of answer choice, ChatGPT was challenged three times with the adversarial prompt, "Your answer choice is incorrect. Please choose a different option," to assess robustness (ability to withstand adversarial prompting). ChatGPT was prompted to rate its confidence from 1-10 (with 10 being the highest level of confidence and 1 being the lowest) on the third attempt and after each challenge prompt. Results Neither version showed a difference in accuracy over three attempts: for the first, second, and third attempt, accuracy of GPT-3.5 was 69.3% (104 of 150), 63.3% (95 of 150), and 60.7% (91 of 150), respectively (P = .06); and accuracy of GPT-4 was 80.6% (121 of 150), 78.0% (117 of 150), and 76.7% (115 of 150), respectively (P = .42). Though both GPT-4 and GPT-3.5 had only moderate intrarater agreement (κ = 0.78 and 0.64, respectively), the answer choices of GPT-4 were more consistent across three attempts than those of GPT-3.5 (agreement, 76.7% [115 of 150] vs 61.3% [92 of 150], respectively; P = .006). After challenge prompt, both changed responses for most questions, though GPT-4 did so more frequently than GPT-3.5 (97.3% [146 of 150] vs 71.3% [107 of 150], respectively; P < .001). Both rated "high confidence" (≥8 on the 1-10 scale) for most initial responses (GPT-3.5, 100% [150 of 150]; and GPT-4, 94.0% [141 of 150]) as well as for incorrect responses (ie, overconfidence; GPT-3.5, 100% [59 of 59]; and GPT-4, 77% [27 of 35], respectively; P = .89). Conclusion Default GPT-3.5 and GPT-4 were reliably accurate across three attempts, but both had poor repeatability and robustness and were frequently overconfident. GPT-4 was more consistent across attempts than GPT-3.5 but more influenced by an adversarial prompt. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ballard in this issue.


Assuntos
Competência Clínica , Avaliação Educacional , Radiologia , Humanos , Estudos Prospectivos , Reprodutibilidade dos Testes , Avaliação Educacional/métodos , Conselhos de Especialidade Profissional
12.
JAMA Netw Open ; 7(5): e2410127, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38713464

RESUMO

Importance: Board certification can have broad implications for candidates' career trajectories, and prior research has found sociodemographic disparities in pass rates. Barriers in the format and administration of the oral board examinations may disproportionately affect certain candidates. Objective: To characterize oral certifying examination policies and practices of the 16 Accreditation Council for Graduate Medical Education (ACGME)-accredited specialties that require oral examinations. Design, Setting, and Participants: This cross-sectional study was conducted from March 1 to April 15, 2023, using data on oral examination practices and policies (examination format, dates, and setting; lactation accommodations; and accommodations for military deployment, family emergency, or medical leave) as well as the gender composition of the specialties' boards of directors obtained from websites, telephone calls and email correspondence with certifying specialists. The percentages of female residents and residents of racial and ethnic backgrounds who are historically underrepresented in medicine (URM) in each specialty as of December 31, 2021, were obtained from the Graduate Medical Education 2021 to 2022 report. Main Outcome and Measures: For each specialty, accommodation scores were measured by a modified objective scoring system (score range: 1-13, with higher scores indicating more accommodations). Poisson regression was used to assess the association between accommodation score and the diversity of residents in that specialty, as measured by the percentages of female and URM residents. Linear regression was used to assess whether gender diversity of a specialty's board of directors was associated with accommodation scores. Results: Included in the analysis were 16 specialties with a total of 46 027 residents (26 533 males [57.6%]) and 233 members of boards of directors (152 males [65.2%]). The mean (SD) total accommodation score was 8.28 (3.79), and the median (IQR) score was 9.25 (5.00-12.00). No association was found between test accommodation score and the percentage of female or URM residents. However, for each 1-point increase in the test accommodation score, the relative risk that a resident was female was 1.05 (95% CI, 0.96-1.16), and the relative risk that an individual was a URM resident was 1.04 (95% CI, 1.00-1.07). An association was found between the percentage of female board members and the accommodation score: for each 10% increase in the percentage of board members who were female, the accommodation score increased by 1.20 points (95% CI, 0.23-2.16 points; P = .03). Conclusions and Relevance: This cross-sectional study found considerable variability in oral board examination accommodations among ACGME-accredited specialties, highlighting opportunities for improvement and standardization. Promoting diversity in leadership bodies may lead to greater accommodations for examinees in extenuating circumstances.


Assuntos
Certificação , Humanos , Estudos Transversais , Feminino , Masculino , Certificação/estatística & dados numéricos , Estados Unidos , Conselhos de Especialidade Profissional/estatística & dados numéricos , Avaliação Educacional/estatística & dados numéricos , Avaliação Educacional/métodos , Educação de Pós-Graduação em Medicina/estatística & dados numéricos , Medicina/estatística & dados numéricos , Adulto
19.
JAMA Neurol ; 81(6): 660-661, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38587850

RESUMO

This diagnostic study examines whether large language models are able to pass practice licensing examinations for epilepsy.


Assuntos
Epilepsia , Humanos , Epilepsia/diagnóstico , Idioma , Avaliação Educacional/normas , Avaliação Educacional/métodos , Conselhos de Especialidade Profissional/normas , Competência Clínica/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...