Pesquisa | BVS Integralidade em Saúde

Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases.

Rutledge, Geoffrey W.

Learn Health Syst ; 8(3): e10438, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-39036534

RESUMO

Introduction: Large language models (LLMs) have a high diagnostic accuracy when they evaluate previously published clinical cases. Methods: We compared the accuracy of GPT-4's differential diagnoses for previously unpublished challenging case scenarios with the diagnostic accuracy for previously published cases. Results: For a set of previously unpublished challenging clinical cases, GPT-4 achieved 61.1% correct in its top 6 diagnoses versus the previously reported 49.1% for physicians. For a set of 45 clinical vignettes of more common clinical scenarios, GPT-4 included the correct diagnosis in its top 3 diagnoses 100% of the time versus the previously reported 84.3% for physicians. Conclusions: GPT-4 performs at a level at least as good as, if not better than, that of experienced physicians on highly challenging cases in internal medicine. The extraordinary performance of GPT-4 on diagnosing common clinical scenarios could be explained in part by the fact that these cases were previously published and may have been included in the training dataset for this LLM.

Ver mais detalhes

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa