Your browser doesn't support javascript.
loading
Adequacy of prostate cancer prevention and screening recommendations provided by an artificial intelligence-powered large language model.
Chiarelli, Giuseppe; Stephens, Alex; Finati, Marco; Cirulli, Giuseppe Ottone; Beatrici, Edoardo; Filipas, Dejan K; Arora, Sohrab; Tinsley, Shane; Bhandari, Mahendra; Carrieri, Giuseppe; Trinh, Quoc-Dien; Briganti, Alberto; Montorsi, Francesco; Lughezzani, Giovanni; Buffi, Nicolò; Rogers, Craig; Abdollah, Firas.
Afiliação
  • Chiarelli G; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Stephens A; Department of Urology, IRCCS Humanitas Research Hospital, Humanitas University, Milan, Italy.
  • Finati M; Public Health Sciences, Henry Ford Health System, Detroit, MI, USA.
  • Cirulli GO; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Beatrici E; Department of Urology and Renal Transplantation, University of Foggia, Foggia, Italy.
  • Filipas DK; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Arora S; Division of Oncology, Unit of Urology, IRCCS Ospedale San Raffaele, Vita-Salute San Raffaele University, Milan, Italy.
  • Tinsley S; Division of Urological Surgery and Center for Surgery and Public Health, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Bhandari M; Department of Urology, IRCCS Humanitas Research Hospital, Humanitas University, Milan, Italy.
  • Carrieri G; Division of Urological Surgery and Center for Surgery and Public Health, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Trinh QD; Department of Urology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
  • Briganti A; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Montorsi F; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Lughezzani G; VUI Center for Outcomes Research, Analysis, and Evaluation, Henry Ford Health System, 2799 W Grand Blvd, Detroit, MI, 48202, USA.
  • Buffi N; Department of Urology and Renal Transplantation, University of Foggia, Foggia, Italy.
  • Rogers C; Division of Urological Surgery and Center for Surgery and Public Health, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
  • Abdollah F; Division of Oncology, Unit of Urology, IRCCS Ospedale San Raffaele, Vita-Salute San Raffaele University, Milan, Italy.
Int Urol Nephrol ; 56(8): 2589-2595, 2024 Aug.
Article em En | MEDLINE | ID: mdl-38564079
ABSTRACT

PURPOSE:

We aimed to assess the appropriateness of ChatGPT in providing answers related to prostate cancer (PCa) screening, comparing GPT-3.5 and GPT-4.

METHODS:

A committee of five reviewers designed 30 questions related to PCa screening, categorized into three difficulty levels. The questions were formulated identically for both GPTs three times, varying the prompts. Each reviewer assigned a score for accuracy, clarity, and conciseness. The readability was assessed by the Flesch Kincaid Grade (FKG) and Flesch Reading Ease (FRE). The mean scores were extracted and compared using the Wilcoxon test. We compared the readability across the three different prompts by ANOVA.

RESULTS:

In GPT-3.5 the mean score (SD) for accuracy, clarity, and conciseness was 1.5 (0.59), 1.7 (0.45), 1.7 (0.49), respectively for easy questions; 1.3 (0.67), 1.6 (0.69), 1.3 (0.65) for medium; 1.3 (0.62), 1.6 (0.56), 1.4 (0.56) for hard. In GPT-4 was 2.0 (0), 2.0 (0), 2.0 (0.14), respectively for easy questions; 1.7 (0.66), 1.8 (0.61), 1.7 (0.64) for medium; 2.0 (0.24), 1.8 (0.37), 1.9 (0.27) for hard. GPT-4 performed better for all three qualities and difficulty levels than GPT-3.5. The FKG mean for GPT-3.5 and GPT-4 answers were 12.8 (1.75) and 10.8 (1.72), respectively; the FRE for GPT-3.5 and GPT-4 was 37.3 (9.65) and 47.6 (9.88), respectively. The 2nd prompt has achieved better results in terms of clarity (all p < 0.05).

CONCLUSIONS:

GPT-4 displayed superior accuracy, clarity, conciseness, and readability than GPT-3.5. Though prompts influenced the quality response in both GPTs, their impact was significant only for clarity.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Neoplasias da Próstata / Inteligência Artificial / Detecção Precoce de Câncer Limite: Humans / Male Idioma: En Revista: Int Urol Nephrol Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Neoplasias da Próstata / Inteligência Artificial / Detecção Precoce de Câncer Limite: Humans / Male Idioma: En Revista: Int Urol Nephrol Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos