Your browser doesn't support javascript.
loading
Comprehensiveness of Large Language Models in Patient Queries on Gingival and Endodontic Health.
Zhang, Qian; Wu, Zhengyu; Song, Jinlin; Luo, Shuicai; Chai, Zhaowu.
Afiliação
  • Zhang Q; College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
  • Wu Z; College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
  • Song J; College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China.
  • Luo S; Quanzhou Institute of Equipment Manufacturing, Haixi Institute, Chinese Academy of Sciences, Quanzhou, China.
  • Chai Z; College of Stomatology, Chongqing Medical University, Chongqing, China; Chongqing Key Laboratory for Oral Diseases and Biomedical Sciences, Chongqing, China; Chongqing Municipal Key Laboratory of Oral Biomedical Engineering of Higher Education, Chongqing, China. Electronic address: 500732@hospital.c
Int Dent J ; 2024 Aug 14.
Article em En | MEDLINE | ID: mdl-39147663
ABSTRACT

AIM:

Given the increasing interest in using large language models (LLMs) for self-diagnosis, this study aimed to evaluate the comprehensiveness of two prominent LLMs, ChatGPT-3.5 and ChatGPT-4, in addressing common queries related to gingival and endodontic health across different language contexts and query types.

METHODS:

We assembled a set of 33 common real-life questions related to gingival and endodontic healthcare, including 17 common-sense questions and 16 expert questions. Each question was presented to the LLMs in both English and Chinese. Three specialists were invited to evaluate the comprehensiveness of the responses on a five-point Likert scale, where a higher score indicated greater quality responses.

RESULTS:

LLMs performed significantly better in English, with an average score of 4.53, compared to 3.95 in Chinese (Mann-Whitney U test, P < .05). Responses to common sense questions received higher scores than those to expert questions, with averages of 4.46 and 4.02 (Mann-Whitney U test, P < .05). Among the LLMs, ChatGPT-4 consistently outperformed ChatGPT-3.5, achieving average scores of 4.45 and 4.03 (Mann-Whitney U test, P < .05).

CONCLUSIONS:

ChatGPT-4 provides more comprehensive responses than ChatGPT-3.5 for queries related to gingival and endodontic health. Both LLMs perform better in English and on common sense questions. However, the performance discrepancies across different language contexts and the presence of inaccurate responses suggest that further evaluation and understanding of their limitations are crucial to avoid potential misunderstandings. CLINICAL RELEVANCE This study revealed the performance differences of ChatGPT-3.5 and ChatGPT-4 in handling gingival and endodontic health issues across different language contexts, providing insights into the comprehensiveness and limitations of LLMs in addressing common oral healthcare queries.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Int Dent J Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Base de dados: MEDLINE Idioma: En Revista: Int Dent J Ano de publicação: 2024 Tipo de documento: Article País de afiliação: China