Your browser doesn't support javascript.
loading
Colorectal Cancer Prevention: Is Chat Generative Pretrained Transformer (Chat GPT) ready to Assist Physicians in Determining Appropriate Screening and Surveillance Recommendations?
Pereyra, Lisandro; Schlottmann, Francisco; Steinberg, Leandro; Lasa, Juan.
Afiliación
  • Pereyra L; Department of Gastroenterology.
  • Schlottmann F; Endoscopy Unit, Department of Surgery.
  • Steinberg L; Endoscopy Unit, Department of Surgery.
  • Lasa J; Department of Surgery, Hospital Alemán of Buenos Aires.
J Clin Gastroenterol ; 2024 Feb 07.
Article en En | MEDLINE | ID: mdl-38319619
ABSTRACT

OBJECTIVE:

To determine whether a publicly available advanced language model could help determine appropriate colorectal cancer (CRC) screening and surveillance recommendations.

BACKGROUND:

Poor physician knowledge or inability to accurately recall recommendations might affect adherence to CRC screening guidelines. Adoption of newer technologies can help improve the delivery of such preventive care services.

METHODS:

An assessment with 10 multiple choice questions, including 5 CRC screening and 5 CRC surveillance clinical vignettes, was inputted into chat generative pretrained transformer (ChatGPT) 3.5 in 4 separate sessions. Responses were recorded and screened for accuracy to determine the reliability of this tool. The mean number of correct answers was then compared against a control group of gastroenterologists and colorectal surgeons answering the same questions with and without the help of a previously validated CRC screening mobile app.

RESULTS:

The average overall performance of ChatGPT was 45%. The mean number of correct answers was 2.75 (95% CI 2.26-3.24), 1.75 (95% CI 1.26-2.24), and 4.5 (95% CI 3.93-5.07) for screening, surveillance, and total questions, respectively. ChatGPT showed inconsistency and gave a different answer in 4 questions among the different sessions. A total of 238 physicians also responded to the assessment; 123 (51.7%) without and 115 (48.3%) with the mobile app. The mean number of total correct answers of ChatGPT was significantly lower than those of physicians without [5.62 (95% CI 5.32-5.92)] and with the mobile app [7.71 (95% CI 7.39-8.03); P < 0.001].

CONCLUSIONS:

Large language models developed with artificial intelligence require further refinements to serve as reliable assistants in clinical practice.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Diagnostic_studies / Guideline / Screening_studies Idioma: En Revista: J Clin Gastroenterol Año: 2024 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Diagnostic_studies / Guideline / Screening_studies Idioma: En Revista: J Clin Gastroenterol Año: 2024 Tipo del documento: Article