Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.

Kaftan, Ahmed Naseer; Hussain, Majid Kadhum; Naser, Farah Hasson

Kaftan, Ahmed Naseer; Hussain, Majid Kadhum; Naser, Farah Hasson.

Afiliação

Kaftan AN; Biochemistry Department, Faculty of Medicine, Kufa University, Najaf, Iraq. ahmedn.kaftan@uokufa.edu.iq.
Hussain MK; Biochemistry Department, Faculty of Medicine, Kufa University, Najaf, Iraq.
Naser FH; Najaf Health Directorate, Ministry of Health, Baghdad, Iraq.

Sci Rep ; 14(1): 8233, 2024 04 08.

Article em En | MEDLINE | ID: mdl-38589613

ABSTRACT

ABSTRACT

With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.

Assuntos

Inteligência Artificial; Humanos; Projetos Piloto; Reprodutibilidade dos Testes; Nitrogênio da Ureia Sanguínea; Creatinina

Palavras-chave

Artificial intelligence models; Biochemical parameters; ChatGPT-3.5; Copilot; Gemini; Interpretation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligência Artificial Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google