Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
JMIR Form Res ; 8: e59267, 2024 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-38924784

RESUMEN

BACKGROUND: The potential of artificial intelligence (AI) chatbots, particularly ChatGPT with GPT-4 (OpenAI), in assisting with medical diagnosis is an emerging research area. However, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in differential diagnosis lists. OBJECTIVE: This study aims to assess the capability of GPT-4 in identifying the final diagnosis from differential-diagnosis lists and to compare its performance with that of physicians for case report series. METHODS: We used a database of differential-diagnosis lists from case reports in the American Journal of Case Reports, corresponding to final diagnoses. These lists were generated by 3 AI systems: GPT-4, Google Bard (currently Google Gemini), and Large Language Models by Meta AI 2 (LLaMA2). The primary outcome was focused on whether GPT-4's evaluations identified the final diagnosis within these lists. None of these AIs received additional medical training or reinforcement. For comparison, 2 independent physicians also evaluated the lists, with any inconsistencies resolved by another physician. RESULTS: The 3 AIs generated a total of 1176 differential diagnosis lists from 392 case descriptions. GPT-4's evaluations concurred with those of the physicians in 966 out of 1176 lists (82.1%). The Cohen κ coefficient was 0.63 (95% CI 0.56-0.69), indicating a fair to good agreement between GPT-4 and the physicians' evaluations. CONCLUSIONS: GPT-4 demonstrated a fair to good agreement in identifying the final diagnosis from differential-diagnosis lists, comparable to physicians for case report series. Its ability to compare differential diagnosis lists with final diagnoses suggests its potential to aid clinical decision-making support through diagnostic feedback. While GPT-4 showed a fair to good agreement for evaluation, its application in real-world scenarios and further validation in diverse clinical environments are essential to fully understand its utility in the diagnostic process.

2.
Diagnosis (Berl) ; 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38465399

RESUMEN

OBJECTIVES: The potential of artificial intelligence (AI) chatbots, particularly the fourth-generation chat generative pretrained transformer (ChatGPT-4), in assisting with medical diagnosis is an emerging research area. While there has been significant emphasis on creating lists of differential diagnoses, it is not yet clear how well AI chatbots can evaluate whether the final diagnosis is included in these lists. This short communication aimed to assess the accuracy of ChatGPT-4 in evaluating lists of differential diagnosis compared to medical professionals' assessments. METHODS: We used ChatGPT-4 to evaluate whether the final diagnosis was included in the top 10 differential diagnosis lists created by physicians, ChatGPT-3, and ChatGPT-4, using clinical vignettes. Eighty-two clinical vignettes were used, comprising 52 complex case reports published by the authors from the department and 30 mock cases of common diseases created by physicians from the same department. We compared the agreement between ChatGPT-4 and the physicians on whether the final diagnosis was included in the top 10 differential diagnosis lists using the kappa coefficient. RESULTS: Three sets of differential diagnoses were evaluated for each of the 82 cases, resulting in a total of 246 lists. The agreement rate between ChatGPT-4 and physicians was 236 out of 246 (95.9 %), with a kappa coefficient of 0.86, indicating very good agreement. CONCLUSIONS: ChatGPT-4 demonstrated very good agreement with physicians in evaluating whether the final diagnosis should be included in the differential diagnosis lists.

3.
Nihon Shokakibyo Gakkai Zasshi ; 121(3): 237-244, 2024.
Artículo en Japonés | MEDLINE | ID: mdl-38462472

RESUMEN

A woman in her 70s was hospitalized and was diagnosed with liver abscess and managed with antibiotics in a previous hospital. However, she experienced altered consciousness and neck stiffness during treatment. She was then referred to our hospital. On investigation, we found that she had meningitis and right endophthalmitis concurrent with a liver abscess. Klebsiella pneumoniae was detected from both cultures of the liver abscess and effusion from the cornea. A string test showed a positive result. Therefore, she was diagnosed with invasive liver abscess syndrome. Although she recovered from the liver abscess and meningitis through empiric antibiotic treatment, her right eye required ophthalmectomy. In cases where a liver abscess presents with extrahepatic complications, such as meningitis and endophthalmitis, the possibility of invasive liver abscess syndrome should be considered, which is caused by a hypervirulent K. pneumoniae.


Asunto(s)
Endoftalmitis , Infecciones por Klebsiella , Absceso Hepático , Meningitis , Femenino , Humanos , Antibacterianos/uso terapéutico , Endoftalmitis/etiología , Endoftalmitis/complicaciones , Infecciones por Klebsiella/complicaciones , Infecciones por Klebsiella/tratamiento farmacológico , Infecciones por Klebsiella/diagnóstico , Klebsiella pneumoniae , Absceso Hepático/diagnóstico por imagen , Absceso Hepático/etiología , Meningitis/complicaciones , Meningitis/tratamiento farmacológico , Anciano
4.
JMIR Med Inform ; 11: e48808, 2023 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-37812468

RESUMEN

BACKGROUND: The diagnostic accuracy of differential diagnoses generated by artificial intelligence chatbots, including ChatGPT models, for complex clinical vignettes derived from general internal medicine (GIM) department case reports is unknown. OBJECTIVE: This study aims to evaluate the accuracy of the differential diagnosis lists generated by both third-generation ChatGPT (ChatGPT-3.5) and fourth-generation ChatGPT (ChatGPT-4) by using case vignettes from case reports published by the Department of GIM of Dokkyo Medical University Hospital, Japan. METHODS: We searched PubMed for case reports. Upon identification, physicians selected diagnostic cases, determined the final diagnosis, and displayed them into clinical vignettes. Physicians typed the determined text with the clinical vignettes in the ChatGPT-3.5 and ChatGPT-4 prompts to generate the top 10 differential diagnoses. The ChatGPT models were not specially trained or further reinforced for this task. Three GIM physicians from other medical institutions created differential diagnosis lists by reading the same clinical vignettes. We measured the rate of correct diagnosis within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and the top diagnosis. RESULTS: In total, 52 case reports were analyzed. The rates of correct diagnosis by ChatGPT-4 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 83% (43/52), 81% (42/52), and 60% (31/52), respectively. The rates of correct diagnosis by ChatGPT-3.5 within the top 10 differential diagnosis lists, top 5 differential diagnosis lists, and top diagnosis were 73% (38/52), 65% (34/52), and 42% (22/52), respectively. The rates of correct diagnosis by ChatGPT-4 were comparable to those by physicians within the top 10 (43/52, 83% vs 39/52, 75%, respectively; P=.47) and within the top 5 (42/52, 81% vs 35/52, 67%, respectively; P=.18) differential diagnosis lists and top diagnosis (31/52, 60% vs 26/52, 50%, respectively; P=.43) although the difference was not significant. The ChatGPT models' diagnostic accuracy did not significantly vary based on open access status or the publication date (before 2011 vs 2022). CONCLUSIONS: This study demonstrates the potential diagnostic accuracy of differential diagnosis lists generated using ChatGPT-3.5 and ChatGPT-4 for complex clinical vignettes from case reports published by the GIM department. The rate of correct diagnoses within the top 10 and top 5 differential diagnosis lists generated by ChatGPT-4 exceeds 80%. Although derived from a limited data set of case reports from a single department, our findings highlight the potential utility of ChatGPT-4 as a supplementary tool for physicians, particularly for those affiliated with the GIM department. Further investigations should explore the diagnostic accuracy of ChatGPT by using distinct case materials beyond its training data. Such efforts will provide a comprehensive insight into the role of artificial intelligence in enhancing clinical decision-making.

5.
Am J Med ; 136(11): 1119-1123.e18, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37643659

RESUMEN

BACKGROUND: In this study, we evaluated the diagnostic accuracy of Google Bard, a generative artificial intelligence (AI) platform. METHODS: We searched published case reports from our department for difficult or uncommon case descriptions and mock cases created by physicians for common case descriptions. We entered the case descriptions into the prompt of Google Bard to generate the top 10 differential-diagnosis lists. As in previous studies, other physicians created differential-diagnosis lists by reading the same clinical descriptions. RESULTS: A total of 82 clinical descriptions (52 case reports and 30 mock cases) were used. The accuracy rates of physicians were still higher than Google Bard in the top 10 (56.1% vs 82.9%, P < .001), the top 5 (53.7% vs 78.0%, P = .002), and the top differential diagnosis (40.2% vs 64.6%, P = .003). Even within the specific context of case reports, physicians consistently outperformed Google Bard. When it came to mock cases, the performances of the differential-diagnosis lists by Google Bard were no different from those of the physicians in the top 10 (80.0% vs 96.6%, P = .11) and the top 5 (76.7% vs 96.6%, P = .06), except for those in the top diagnoses (60.0% vs 90.0%, P = .02). CONCLUSION: While physicians excelled overall, and particularly with case reports, Google Bard displayed comparable diagnostic performance in common cases. This suggested that Google Bard possesses room for further improvement and refinement in its diagnostic capabilities. Generative AIs, including Google Bard, are anticipated to become increasingly beneficial in augmenting diagnostic accuracy.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA