Evaluating the diagnostic performance of a large language model-powered chatbot for providing immunohistochemistry recommendations in dermatopathology.

McCrary, Myles R; Galambus, Justine; Chen, Wei-Shen

McCrary, Myles R; Galambus, Justine; Chen, Wei-Shen.

Afiliação

McCrary MR; Department of Anatomic and Clinical Pathology, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA.
Galambus J; Department of Dermatology and Cutaneous Surgery, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA.
Chen WS; Department of Dermatology and Cutaneous Surgery, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA.

J Cutan Pathol ; 2024 May 14.

Article em En | MEDLINE | ID: mdl-38744501

ABSTRACT

ABSTRACT

BACKGROUND:

Large language model (LLM)-powered chatbots such as ChatGPT have numerous applications. However, their effectiveness in dermatopathology has not been formally evaluated. Dermatopathological cases often require immunohistochemical workup. Here, we evaluate the performance of a chatbot in providing diagnostically useful information on immunohistochemistry relating to dermatological diseases.

METHODS:

We queried a commonly used chatbot for the immunophenotypes of 51 cutaneous diseases, including a diverse variety of epidermal, adnexal, hematolymphoid, and soft tissue entities. We requested it to provide references for each diagnosis. All tests were repeated, compiled, quantified, and then compared with established literature standards.

RESULTS:

Clustering analysis demonstrated that recommendations correlated with tumor type, suggesting chatbots can supply appropriate panels. However, a significant portion of recommendations were factually incorrect (13.9%). Citations were rarely clinically useful (24.5%). Many were confabulated (27.2%). Prompt responses for cutaneous adnexal lesions tended to be less accurate while literature references were less useful. Reference retrieval performance was associated with the number of PubMed entries per entity.

CONCLUSIONS:

This foundational study suggests that LLM-powered chatbots may be useful for generating immunohistochemical panels for dermatologic diagnoses. However, specific performance capabilities and biases must be considered. In addition, extreme caution is advised regarding the tendencies to fabricate material. Future models intentionally fine-tuned to augment diagnostic medicine may prove to be valuable.

Palavras-chave

artificial intelligence; immunohistochemistry; large language model

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links