Your browser doesn't support javascript.
loading
Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models.
Pompili, David; Richa, Yasmina; Collins, Patrick; Richards, Helen; Hennessey, Derek B.
Afiliação
  • Pompili D; School of Medicine, University College Cork, Cork, Ireland.
  • Richa Y; School of Medicine, University College Cork, Cork, Ireland.
  • Collins P; Department of Urology, Mercy University Hospital, Cork, Ireland.
  • Richards H; School of Medicine, University College Cork, Cork, Ireland.
  • Hennessey DB; Department of Clinical Psychology, Mercy University Hospital, Cork, Ireland.
World J Urol ; 42(1): 455, 2024 Jul 29.
Article em En | MEDLINE | ID: mdl-39073590
ABSTRACT

PURPOSE:

Large language models (LLMs) are a form of artificial intelligence (AI) that uses deep learning techniques to understand, summarize and generate content. The potential benefits of LLMs in healthcare is predicted to be immense. The objective of this study was to examine the quality of patient information leaflets (PILs) produced by 3 LLMs on urological topics.

METHODS:

Prompts were created to generate PILs from 3 LLMs ChatGPT-4, PaLM 2 (Google Bard) and Llama 2 (Meta) across four urology topics (circumcision, nephrectomy, overactive bladder syndrome, and transurethral resection of the prostate). PILs were evaluated using a quality assessment checklist. PIL readability was assessed by the Average Reading Level Consensus Calculator.

RESULTS:

PILs generated by PaLM 2 had the highest overall average quality score (3.58), followed by Llama 2 (3.34) and ChatGPT-4 (3.08). PaLM 2 generated PILs were of the highest quality in all topics except TURP and was the only LLM to include images. Medical inaccuracies were present in all generated content including instances of significant error. Readability analysis identified PaLM 2 generated PILs as the simplest (age 14-15 average reading level). Llama 2 PILs were the most difficult (age 16-17 average).

CONCLUSION:

While LLMs can generate PILs that may help reduce healthcare professional workload, generated content requires clinician input for accuracy and inclusion of health literacy aids, such as images. LLM-generated PILs were above the average reading level for adults, necessitating improvement in LLM algorithms and/or prompt design. How satisfied patients are to LLM-generated PILs remains to be evaluated.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Urologia / Inteligência Artificial Limite: Humans Idioma: En Revista: World J Urol Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Irlanda

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Urologia / Inteligência Artificial Limite: Humans Idioma: En Revista: World J Urol Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Irlanda