Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 4.214
2.
J Am Board Fam Med ; 37(2): 279-289, 2024.
Article En | MEDLINE | ID: mdl-38740475

BACKGROUND: The potential for machine learning (ML) to enhance the efficiency of medical specialty boards has not been explored. We applied unsupervised ML to identify archetypes among American Board of Family Medicine (ABFM) Diplomates regarding their practice characteristics and motivations for participating in continuing certification, then examined associations between motivation patterns and key recertification outcomes. METHODS: Diplomates responding to the 2017 to 2021 ABFM Family Medicine continuing certification examination surveys selected motivations for choosing to continue certification. We used Chi-squared tests to examine difference proportions of Diplomates failing their first recertification examination attempt who endorsed different motivations for maintaining certification. Unsupervised ML techniques were applied to generate clusters of physicians with similar practice characteristics and motivations for recertifying. Controlling for physician demographic variables, we used logistic regression to examine the effect of motivation clusters on recertification examination success and validated the ML clusters by comparison with a previously created classification schema developed by experts. RESULTS: ML clusters largely recapitulated the intrinsic/extrinsic framework devised by experts previously. However, the identified clusters achieved a more equal partitioning of Diplomates into homogenous groups. In both ML and human clusters, physicians with mainly extrinsic or mixed motivations had lower rates of examination failure than those who were intrinsically motivated. DISCUSSION: This study demonstrates the feasibility of using ML to supplement and enhance human interpretation of board certification data. We discuss implications of this demonstration study for the interaction between specialty boards and physician Diplomates.


Certification , Family Practice , Machine Learning , Motivation , Specialty Boards , Humans , Family Practice/education , Male , Female , United States , Adult , Education, Medical, Continuing , Middle Aged , Surveys and Questionnaires , Educational Measurement/methods , Educational Measurement/statistics & numerical data , Clinical Competence
3.
JAMA Netw Open ; 7(5): e2410127, 2024 May 01.
Article En | MEDLINE | ID: mdl-38713464

Importance: Board certification can have broad implications for candidates' career trajectories, and prior research has found sociodemographic disparities in pass rates. Barriers in the format and administration of the oral board examinations may disproportionately affect certain candidates. Objective: To characterize oral certifying examination policies and practices of the 16 Accreditation Council for Graduate Medical Education (ACGME)-accredited specialties that require oral examinations. Design, Setting, and Participants: This cross-sectional study was conducted from March 1 to April 15, 2023, using data on oral examination practices and policies (examination format, dates, and setting; lactation accommodations; and accommodations for military deployment, family emergency, or medical leave) as well as the gender composition of the specialties' boards of directors obtained from websites, telephone calls and email correspondence with certifying specialists. The percentages of female residents and residents of racial and ethnic backgrounds who are historically underrepresented in medicine (URM) in each specialty as of December 31, 2021, were obtained from the Graduate Medical Education 2021 to 2022 report. Main Outcome and Measures: For each specialty, accommodation scores were measured by a modified objective scoring system (score range: 1-13, with higher scores indicating more accommodations). Poisson regression was used to assess the association between accommodation score and the diversity of residents in that specialty, as measured by the percentages of female and URM residents. Linear regression was used to assess whether gender diversity of a specialty's board of directors was associated with accommodation scores. Results: Included in the analysis were 16 specialties with a total of 46 027 residents (26 533 males [57.6%]) and 233 members of boards of directors (152 males [65.2%]). The mean (SD) total accommodation score was 8.28 (3.79), and the median (IQR) score was 9.25 (5.00-12.00). No association was found between test accommodation score and the percentage of female or URM residents. However, for each 1-point increase in the test accommodation score, the relative risk that a resident was female was 1.05 (95% CI, 0.96-1.16), and the relative risk that an individual was a URM resident was 1.04 (95% CI, 1.00-1.07). An association was found between the percentage of female board members and the accommodation score: for each 10% increase in the percentage of board members who were female, the accommodation score increased by 1.20 points (95% CI, 0.23-2.16 points; P = .03). Conclusions and Relevance: This cross-sectional study found considerable variability in oral board examination accommodations among ACGME-accredited specialties, highlighting opportunities for improvement and standardization. Promoting diversity in leadership bodies may lead to greater accommodations for examinees in extenuating circumstances.


Certification , Humans , Cross-Sectional Studies , Female , Male , Certification/statistics & numerical data , United States , Specialty Boards/statistics & numerical data , Educational Measurement/statistics & numerical data , Educational Measurement/methods , Education, Medical, Graduate/statistics & numerical data , Medicine/statistics & numerical data , Adult
6.
Radiology ; 311(2): e232715, 2024 May.
Article En | MEDLINE | ID: mdl-38771184

Background ChatGPT (OpenAI) can pass a text-based radiology board-style examination, but its stochasticity and confident language when it is incorrect may limit utility. Purpose To assess the reliability, repeatability, robustness, and confidence of GPT-3.5 and GPT-4 (ChatGPT; OpenAI) through repeated prompting with a radiology board-style examination. Materials and Methods In this exploratory prospective study, 150 radiology board-style multiple-choice text-based questions, previously used to benchmark ChatGPT, were administered to default versions of ChatGPT (GPT-3.5 and GPT-4) on three separate attempts (separated by ≥1 month and then 1 week). Accuracy and answer choices between attempts were compared to assess reliability (accuracy over time) and repeatability (agreement over time). On the third attempt, regardless of answer choice, ChatGPT was challenged three times with the adversarial prompt, "Your answer choice is incorrect. Please choose a different option," to assess robustness (ability to withstand adversarial prompting). ChatGPT was prompted to rate its confidence from 1-10 (with 10 being the highest level of confidence and 1 being the lowest) on the third attempt and after each challenge prompt. Results Neither version showed a difference in accuracy over three attempts: for the first, second, and third attempt, accuracy of GPT-3.5 was 69.3% (104 of 150), 63.3% (95 of 150), and 60.7% (91 of 150), respectively (P = .06); and accuracy of GPT-4 was 80.6% (121 of 150), 78.0% (117 of 150), and 76.7% (115 of 150), respectively (P = .42). Though both GPT-4 and GPT-3.5 had only moderate intrarater agreement (κ = 0.78 and 0.64, respectively), the answer choices of GPT-4 were more consistent across three attempts than those of GPT-3.5 (agreement, 76.7% [115 of 150] vs 61.3% [92 of 150], respectively; P = .006). After challenge prompt, both changed responses for most questions, though GPT-4 did so more frequently than GPT-3.5 (97.3% [146 of 150] vs 71.3% [107 of 150], respectively; P < .001). Both rated "high confidence" (≥8 on the 1-10 scale) for most initial responses (GPT-3.5, 100% [150 of 150]; and GPT-4, 94.0% [141 of 150]) as well as for incorrect responses (ie, overconfidence; GPT-3.5, 100% [59 of 59]; and GPT-4, 77% [27 of 35], respectively; P = .89). Conclusion Default GPT-3.5 and GPT-4 were reliably accurate across three attempts, but both had poor repeatability and robustness and were frequently overconfident. GPT-4 was more consistent across attempts than GPT-3.5 but more influenced by an adversarial prompt. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Ballard in this issue.


Clinical Competence , Educational Measurement , Radiology , Humans , Prospective Studies , Reproducibility of Results , Educational Measurement/methods , Specialty Boards
7.
World J Urol ; 42(1): 250, 2024 Apr 23.
Article En | MEDLINE | ID: mdl-38652322

PURPOSE: To compare ChatGPT-4 and ChatGPT-3.5's performance on Taiwan urology board examination (TUBE), focusing on answer accuracy, explanation consistency, and uncertainty management tactics to minimize score penalties from incorrect responses across 12 urology domains. METHODS: 450 multiple-choice questions from TUBE(2020-2022) were presented to two models. Three urologists assessed correctness and consistency of each response. Accuracy quantifies correct answers; consistency assesses logic and coherence in explanations out of total responses, alongside a penalty reduction experiment with prompt variations. Univariate logistic regression was applied for subgroup comparison. RESULTS: ChatGPT-4 showed strengths in urology, achieved an overall accuracy of 57.8%, with annual accuracies of 64.7% (2020), 58.0% (2021), and 50.7% (2022), significantly surpassing ChatGPT-3.5 (33.8%, OR = 2.68, 95% CI [2.05-3.52]). It could have passed the TUBE written exams if solely based on accuracy but failed in the final score due to penalties. ChatGPT-4 displayed a declining accuracy trend over time. Variability in accuracy across 12 urological domains was noted, with more frequently updated knowledge domains showing lower accuracy (53.2% vs. 62.2%, OR = 0.69, p = 0.05). A high consistency rate of 91.6% in explanations across all domains indicates reliable delivery of coherent and logical information. The simple prompt outperformed strategy-based prompts in accuracy (60% vs. 40%, p = 0.016), highlighting ChatGPT's limitations in its inability to accurately self-assess uncertainty and a tendency towards overconfidence, which may hinder medical decision-making. CONCLUSIONS: ChatGPT-4's high accuracy and consistent explanations in urology board examination demonstrate its potential in medical information processing. However, its limitations in self-assessment and overconfidence necessitate caution in its application, especially for inexperienced users. These insights call for ongoing advancements of urology-specific AI tools.


Educational Measurement , Urology , Taiwan , Educational Measurement/methods , Clinical Competence , Humans , Specialty Boards
8.
Am Surg ; 90(6): 1491-1496, 2024 Jun.
Article En | MEDLINE | ID: mdl-38557331

INTRODUCTION: The American Board of Surgery awards board certification after successful completion of both the Qualifying Exam and Certifying Exam. Although multiple studies have evaluated board performance at the resident level, fewer studies have evaluated board performance at the program level. METHODS: Program pass rates, available through the American Board of Surgery, for 2019-2021 were compared to program information through the American Medical Association Fellowship and Residency Electronic Interactive Database Access (FREIDA). RESULTS: A significant positive correlation of Certifying Exam performance to residency length, resident class size, and number of total physician faculty within the program was seen. Greater average hours of didactics per week had a significant positive correlation to improved Qualifying Exam performance but not Certifying Exam. Programs with higher percentages of residents graduating from a United States MD program, compared to international or DO schools, were associated with improved performances. It also appears that more established programs performed better than younger programs <20 years old. Programs in the West and Midwest performed significantly better on the Qualifying Exam than programs in the South and Northeast. CONCLUSION: Board certification serves as the capstone for surgeons after completing general surgery residency. Multiple program factors demonstrate a significant correlation to board performance.


Certification , General Surgery , Internship and Residency , Specialty Boards , United States , General Surgery/education , Humans , Educational Measurement , Clinical Competence
9.
J Surg Educ ; 81(6): 786-793, 2024 Jun.
Article En | MEDLINE | ID: mdl-38658312

OBJECTIVE: Didactic education in General Surgery (GS) residency typically follows a nationally standardized curriculum; however, instructional format varies by institution. In recent years, GS residents at our institution expressed discontentment with weekly didactics and were not meeting their goals on the American Board of Surgery In-Training Examination (ABSITE). We sought to develop improvements in our didactic curriculum to increase resident satisfaction and ABSITE scores of GS junior residents (Jrs). DESIGN: In a quality improvement project, we changed the weekly didactic curriculum format from hour-long lectures in the 2018 to 2019 academic year (AY) to a partially-flipped classroom in the 2019 to 2020 AY, involving a 30-minute faculty-led presentation followed by 30 minutes of resident-led practice questions. The outcomes measured were ABSITE scores taken in 2019 and 2020 and resident opinions via an anonymous survey. SETTING: This study was conducted at the University of Minnesota (Minneapolis, MN). PARTICIPANTS: The cohort for this study included all GS Jrs in our GS residency program, including postgraduate year (PGY) 1 nondesignated preliminary, PGY1 to 3 categorical GS residents, and residents in their lab time. Senior residents attended a separate didactics session. RESULTS: After curriculum changes, the ABSITE percentile scores for GS Jrs rose from 52% ± 5% to 66% ± 4% (p = 0.03). No categorical GS Jr scored <30% in 2020, compared to 20% (6/30) of categorical General Surgery residents in 2019. All residents preferred the new format overall and reported greater engagement in and preparation for didactics. CONCLUSIONS: After changing didactic education from hour-long lectures in the 2018 to 2019 AY to a flipped classroom model in the 2019 to 2020 AY including 30 minutes of faculty-led lecture followed by 30 minutes of resident-led practice questions, ABSITE scores and resident satisfaction at the University of Minnesota General Surgery Program improved.


Curriculum , Educational Measurement , General Surgery , Internship and Residency , General Surgery/education , United States , Humans , Education, Medical, Graduate/methods , Specialty Boards , Quality Improvement , Male , Female , Clinical Competence , Minnesota
10.
J Surg Educ ; 81(6): 866-871, 2024 Jun.
Article En | MEDLINE | ID: mdl-38658310

OBJECTIVE: Despite its ubiquity in the certification process among surgical specialties, there is little data regarding oral board delivery across various procedural fields. In this study we sought to determine the specifics of oral board exam administration across surgical disciplines with the goal of highlighting common practices, differences, and areas of innovation. This comparative analysis might further serve to identify unifying principles that undergird the oral board examination process across specialties. DESIGN: A standardized questionnaire was developed that included domains of exam structure/administration, content development, exam prerequisites, information about examiners, scoring, pass/failure rates, and emerging technologies. Between December 2022 and February 2023 structured interviews were conducted to discuss specifics of various oral board exams. Interview answers were compared between various specialties to extrapolate themes and to highlight innovative or emerging techniques among individual boards. SETTING: Interviews were conducted virtually. PARTICIPANTS: Executive members of 9 procedural medical boards including anesthesiology, neurosurgery, obstetrics, and gynecology, ophthalmology, orthopaedic surgery, otolaryngology-head and neck surgery, plastic surgery, general surgery, and urology RESULTS: Common themes include assessment of pre-, intra- and postoperative care; all testing involved candidate examination by multiple examiners and psychometricians were used by all organizations. Important differences included virtual versus in person administration (3 out of 9), inclusion and discussion of candidates' case logs as part of the exam (4 out of 9), formal assessment of professionalism (4 out of 9), and inclusion of an objective structured clinical examination (2 out of 9). CONCLUSIONS: While there are common themes and practices in the oral board delivery process between various surgical fields, and important differences continue to exist. Ongoing efforts to standardize exam administration and determine best practices are needed to ensure oral board exams continue to effectively establish that candidates meet the qualifications required for board certification.


Specialties, Surgical , Specialty Boards , Specialties, Surgical/education , Humans , Educational Measurement/methods , Surveys and Questionnaires , Clinical Competence , Certification , United States
13.
Clin Obstet Gynecol ; 67(2): 326-334, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38587005

As I reflect on my 30 years in academic medicine, my professional journey is uniquely intertwined with the growth and development of the field of urogynecology and the ultimate subspecialty recognition by the American Board of Obstetrics and Gynecology (ABOG), the Association of American Medical Colleges (AAMC), and the American Board of Medical Specialties (ABMS). In this article, I will retrace that journey from personal memories and notes, conversations with the leaders in the room, and documents and minutes generously provided by ABOG and the American Urogynecologic Society (AUGS). There were many leadership lessons learned, and I hope sharing them will enable the readers to do this type of transformational work in their own institution and broadly as advocates of women's health.


Gynecology , Obstetrics , Humans , United States , Specialty Boards , Leadership , Female , Societies, Medical , History, 21st Century , History, 20th Century
18.
Clin Exp Nephrol ; 28(5): 465-469, 2024 May.
Article En | MEDLINE | ID: mdl-38353783

BACKGROUND: Large language models (LLMs) have impacted advances in artificial intelligence. While LLMs have demonstrated high performance in general medical examinations, their performance in specialized areas such as nephrology is unclear. This study aimed to evaluate ChatGPT and Bard in their potential nephrology applications. METHODS: Ninety-nine questions from the Self-Assessment Questions for Nephrology Board Renewal from 2018 to 2022 were presented to two versions of ChatGPT (GPT-3.5 and GPT-4) and Bard. We calculated the correct answer rates for the five years, each year, and question categories and checked whether they exceeded the pass criterion. The correct answer rates were compared with those of the nephrology residents. RESULTS: The overall correct answer rates for GPT-3.5, GPT-4, and Bard were 31.3% (31/99), 54.5% (54/99), and 32.3% (32/99), respectively, thus GPT-4 significantly outperformed GPT-3.5 (p < 0.01) and Bard (p < 0.01). GPT-4 passed in three years, barely meeting the minimum threshold in two. GPT-4 demonstrated significantly higher performance in problem-solving, clinical, and non-image questions than GPT-3.5 and Bard. GPT-4's performance was between third- and fourth-year nephrology residents. CONCLUSIONS: GPT-4 outperformed GPT-3.5 and Bard and met the Nephrology Board renewal standards in specific years, albeit marginally. These results highlight LLMs' potential and limitations in nephrology. As LLMs advance, nephrologists should understand their performance for future applications.


Nephrology , Self-Assessment , Humans , Educational Measurement , Specialty Boards , Clinical Competence , Artificial Intelligence
19.
Am J Orthod Dentofacial Orthop ; 165(4): 383-384, 2024 Apr.
Article En | MEDLINE | ID: mdl-38402482

As a specialty board, the American Board of Orthodontics (ABO) serves to protect the public and the orthodontic specialty by certifying orthodontists. The demonstration of commitment to lifelong learning and self-improvement is critical to achieving the highest level of patient care. The ABO completed a practice analysis study in 2023 to ensure all examinations represent current assessments of proficiency in orthodontics at a level of quality that satisfies professional expectations. The practice analysis is essential to providing a demonstrable relationship between the examination content and orthodontic practice and provides a critical foundation for ABO's examination programs.


Orthodontics , Humans , United States , Specialty Boards , Orthodontists , Dental Care
20.
JAMA Intern Med ; 184(4): 349-350, 2024 Apr 01.
Article En | MEDLINE | ID: mdl-38345810

This essay shines a light on structural bias inherent to the board certification examination process, sharing the author's experience preparing and sitting for the examination while contending with co-occurring challenging life events.


Certification , Specialty Boards , Humans , United States , Physical Examination , Educational Measurement , Clinical Competence
...