[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons].

Nechay, T V; Sazhin, A V; Loban, K M; Bogomolova, A K; Suglob, V V; Beniia, T R

[Efficacy and safety of artificial intelligence-based large language models for decision making support in herniology: evaluation by experts and general surgeons]. / Effektivnost' i bezopasnost' bol'shikh yazykovykh modelei na osnove iskusstvennogo intellekta v kachestve instrumenta podderzhki prinyatiya reshenii v gerniologii: otsenka ekspertami i obshchimi khirurgami.

Nechay, T V; Sazhin, A V; Loban, K M; Bogomolova, A K; Suglob, V V; Beniia, T R.

Afiliação

Nechay TV; Pirogov Russian National Research Medical University, Moscow, Russia.
Sazhin AV; Pirogov Russian National Research Medical University, Moscow, Russia.
Loban KM; Pirogov Russian National Research Medical University, Moscow, Russia.
Bogomolova AK; Pirogov Russian National Research Medical University, Moscow, Russia.
Suglob VV; Pirogov Russian National Research Medical University, Moscow, Russia.
Beniia TR; Pirogov Russian National Research Medical University, Moscow, Russia.

Khirurgiia (Mosk) ; (8): 6-14, 2024.

Article em Ru | MEDLINE | ID: mdl-39140937

ABSTRACT

ABSTRACT

OBJECTIVE:

To evaluate the quality of recommendations provided by ChatGPT regarding inguinal hernia repair. MATERIAL AND

METHODS:

ChatGPT was asked 5 questions about surgical management of inguinal hernias. The chat-bot was assigned the role of expert in herniology and requested to search only specialized medical databases and provide information about references and evidence. Herniology experts and surgeons (non-experts) rated the quality of recommendations generated by ChatGPT using 4-point scale (from 0 to 3 points). Statistical correlations were explored between participants' ratings and their stance regarding artificial intelligence.

RESULTS:

Experts scored the quality of ChatGPT responses lower than non-experts (2 (1-2) vs. 2 (2-3), p<0.001). The chat-bot failed to provide valid references and actual evidence, as well as falsified half of references. Respondents were optimistic about the future of neural networks for clinical decision-making support. Most of them were against restricting their use in healthcare.

CONCLUSION:

We would not recommend non-specialized large language models as a single or primary source of information for clinical decision making or virtual searching assistant.

Assuntos

Inteligência Artificial; Herniorrafia; Humanos; Herniorrafia/métodos; Cirurgiões; Hérnia Inguinal/cirurgia; Tomada de Decisão Clínica/métodos; Sistemas de Apoio a Decisões Clínicas

Palavras-chave

ChatGPT; artificial intelligence; clinical decision making support tool; evidence level; guidelines; inguinal hernia; large language model

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Inteligência Artificial / Herniorrafia Limite: Humans Idioma: Ru Revista: Khirurgiia (Mosk) Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Federação Russa País de publicação: Federação Russa

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google