Pesquisa | Portal de Pesquisa da BVS

ChatGPT, Bard, and Bing Chat are large language processing models that answered OITE questions with a similar accuracy to first-year orthopaedic surgery residents.

Guerra, Gage A; Hofmann, Hayden L; Le, Jonathan L; Wong, Alexander M; Fathi, Amir; Mayfield, Cory K; Petrigliano, Frank A; Liu, Joseph N.

Arthroscopy ; 2024 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-39209078

RESUMO

PURPOSE: To assess ChatGPT, Bard, and BingChat's ability to generate accurate orthopaedic diagnosis or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) to orthopaedic trainees. METHODS: OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT3.5), Bard, and BingChat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by post-graduate year. RESULTS: ChatGPT correctly answered 46.3% of composite questions whereas BingChat correctly answered 52.4% and Bard correctly answered 51.4% of questions on the OITE. Upon excluding image-associated questions, ChatGPT, BingChat, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students and orthopaedic residents (PGY1-5) correctly answered 30.8%, 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively. CONCLUSION: ChatGPT, Bard, and BingChat are AI models that answered OITE questions with an accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and BingChat achieved this result without using images or other supplementary media that human test takers are provided.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Hofmann, Hayden L; Guerra, Gage A; Le, Jonathan L; Wong, Alexander M; Hofmann, Grady H; Mayfield, Cory K; Petrigliano, Frank A; Liu, Joseph N.

Orthopedics ; 47(2): e85-e89, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-37757748

RESUMO

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (P<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [Orthopedics. 2024;47(2):e85-e89.].

Assuntos

Internato e Residência , Procedimentos Ortopédicos , Ortopedia , Humanos , Ortopedia/educação , Inteligência Artificial , Avaliação Educacional , Competência Clínica

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA