Search | VHL Regional Portal

The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.

Günay, Serkan; Öztürk, Ahmet; Yigit, Yavuz.

Am J Emerg Med ; 84: 68-73, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39096711

ABSTRACT

INTRODUCTION: GPT-4, GPT-4o and Gemini advanced, which are among the well-known large language models (LLMs), have the capability to recognize and interpret visual data. When the literature is examined, there are a very limited number of studies examining the ECG performance of GPT-4. However, there is no study in the literature examining the success of Gemini and GPT-4o in ECG evaluation. The aim of our study is to evaluate the performance of GPT-4, GPT-4o, and Gemini in ECG evaluation, assess their usability in the medical field, and compare their accuracy rates in ECG interpretation with those of cardiologists and emergency medicine specialists. METHODS: The study was conducted from May 14, 2024, to June 3, 2024. The book "150 ECG Cases" served as a reference, containing two sections: daily routine ECGs and more challenging ECGs. For this study, two emergency medicine specialists selected 20 ECG cases from each section, totaling 40 cases. In the next stage, the questions were evaluated by emergency medicine specialists and cardiologists. In the subsequent phase, a diagnostic question was entered daily into GPT-4, GPT-4o, and Gemini Advanced on separate chat interfaces. In the final phase, the responses provided by cardiologists, emergency medicine specialists, GPT-4, GPT-4o, and Gemini Advanced were statistically evaluated across three categories: routine daily ECGs, more challenging ECGs, and the total number of ECGs. RESULTS: Cardiologists outperformed GPT-4, GPT-4o, and Gemini Advanced in all three groups. Emergency medicine specialists performed better than GPT-4o in routine daily ECG questions and total ECG questions (p = 0.003 and p = 0.042, respectively). When comparing GPT-4o with Gemini Advanced and GPT-4, GPT-4o performed better in total ECG questions (p = 0.027 and p < 0.001, respectively). In routine daily ECG questions, GPT-4o also outperformed Gemini Advanced (p = 0.004). Weak agreement was observed in the responses given by GPT-4 (p < 0.001, Fleiss Kappa = 0.265) and Gemini Advanced (p < 0.001, Fleiss Kappa = 0.347), while moderate agreement was observed in the responses given by GPT-4o (p < 0.001, Fleiss Kappa = 0.514). CONCLUSION: While GPT-4o shows promise, especially in more challenging ECG questions, and may have potential as an assistant for ECG evaluation, its performance in routine and overall assessments still lags behind human specialists. The limited accuracy and consistency of GPT-4 and Gemini suggest that their current use in clinical ECG interpretation is risky.

Subject(s)

Cardiologists , Electrocardiography , Emergency Medicine , Humans , Electrocardiography/methods , Female , Male , Middle Aged , Adult

Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment.

Meral, Gürbüz; Ates, Serdal; Günay, Serkan; Öztürk, Ahmet; Kusdogan, Mikail.

Am J Emerg Med ; 81: 146-150, 2024 Jul.

Article in English | MEDLINE | ID: mdl-38728938

ABSTRACT

INTRODUCTION: The term Artificial Intelligence (AI) was first coined in the 1960s and has made significant progress up to the present day. During this period, numerous AI applications have been developed. GPT-4 and Gemini are two of the best-known of these AI models. As a triage system The Emergency Severity Index (ESI) is currently one of the most commonly used for effective patient triage in the emergency department. The aim of this study is to evaluate the performance of GPT-4, Gemini, and emergency medicine specialists in ESI triage against each other; furthermore, it aims to contribute to the literature on the usability of these AI programs in emergency department triage. METHODS: Our study was conducted between February 1, 2024, and February 29, 2024, among emergency medicine specialists in Turkey, as well as with GPT-4 and Gemini. Ten emergency medicine specialists were included in our study but as a limitation the emergency medicine specialists participating in the study do not frequently use the ESI triage model in daily practice. In the first phase of our study, 100 case examples related to adult or trauma patients were extracted from the sample and training cases found in the ESI Implementation Handbook. In the second phase of our study, the provided responses were categorized into three groups: correct triage, over-triage, and under-triage. In the third phase of our study, the questions were categorized according to the correct triage responses. RESULTS: In the results of our study, a statistically significant difference was found between the three groups in terms of correct triage, over-triage, and under-triage (p < 0.001). GPT-4 was found to have the highest correct triage rate with an average of 70.60 (±3.74), while Gemini had the highest over-triage rate with an average of 35.2 (±2.93) (p < 0.001). The highest under-triage rate was observed in emergency medicine specialists (32.90 (±11.83)). In the ESI 1-2 class, Gemini had a correct triage rate of 87.77%, GPT-4 had 85.11%, and emergency medicine specialists had 49.33%. CONCLUSION: In conclusion, our study shows that both GPT-4 and Gemini can accurately triage critical and urgent patients in ESI 1&2 groups at a high rate. Furthermore, GPT-4 has been more successful in ESI triage for all patients. These results suggest that GPT-4 and Gemini could assist in accurate ESI triage of patients in emergency departments.

Subject(s)

Emergency Medicine , Emergency Service, Hospital , Triage , Triage/methods , Humans , Emergency Service, Hospital/organization & administration , Turkey , Artificial Intelligence , Adult , Female , Male , Severity of Illness Index

Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.

Günay, Serkan; Öztürk, Ahmet; Özerol, Hakan; Yigit, Yavuz; Erenler, Ali Kemal.

Am J Emerg Med ; 80: 51-60, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38507847

ABSTRACT

INTRODUCTION: ChatGPT, developed by OpenAI, represents the cutting-edge in its field with its latest model, GPT-4. Extensive research is currently being conducted in various domains, including cardiovascular diseases, using ChatGPT. Nevertheless, there is a lack of studies addressing the proficiency of GPT-4 in diagnosing conditions based on Electrocardiography (ECG) data. The goal of this study is to evaluate the diagnostic accuracy of GPT-4 when provided with ECG data, and to compare its performance with that of emergency medicine specialists and cardiologists. METHODS: This study has received approval from the Clinical Research Ethics Committee of Hitit University Medical Faculty on August 21, 2023 (decision no: 2023-91). Drawing on cases from the "150 ECG Cases" book, a total of 40 ECG cases were crafted into multiple-choice questions (comprising 20 everyday and 20 more challenging ECG questions). The participant pool included 12 emergency medicine specialists and 12 cardiology specialists. GPT-4 was administered the questions in a total of 12 separate sessions. The responses from the cardiology physicians, emergency medicine physicians, and GPT-4 were evaluated separately for each of the three groups. RESULTS: In the everyday ECG questions, GPT-4 demonstrated superior performance compared to both the emergency medicine specialists and the cardiology specialists (p < 0.001, p = 0.001). In the more challenging ECG questions, while Chat-GPT outperformed the emergency medicine specialists (p < 0.001), no significant statistical difference was found between Chat-GPT and the cardiology specialists (p = 0.190). Upon examining the accuracy of the total ECG questions, Chat-GPT was found to be more successful compared to both the Emergency Medicine Specialists and the cardiologists (p < 0.001, p = 0.001). CONCLUSION: Our study has shown that GPT-4 is more successful than emergency medicine specialists in evaluating both everyday and more challenging ECG questions. It performed better compared to cardiologists on everyday questions, but its performance aligned closely with that of the cardiologists as the difficulty of the questions increased.

Subject(s)

Cardiologists , Clinical Competence , Electrocardiography , Emergency Medicine , Humans , Male , Female , Adult , Middle Aged , Cardiovascular Diseases/diagnosis

AI in patient education: Assessing the impact of ChatGPT-4 on conveying comprehensive information about chest pain.

Günay, Serkan; Yigit, Yavuz; Halhalli, Hüseyin Cahit; Tulgar, Serkan; Alkahlout, Baha Hamdi; Azad, Aftab Mohammad.

Am J Emerg Med ; 77: 220-221, 2024 03.

Article in English | MEDLINE | ID: mdl-38242775

Subject(s)

Chest Pain , Patient Education as Topic , Humans , Chest Pain/etiology

Effect of pulmonary embolism location on electrocardiological parameters.

Günay, Serkan; Sanci, Emre; Sari, Ahmet Emir; Gümüs, Semiha Aksoy; Özen, Deniz Kaptan; Halhalli, Hüseyin Cahit.

Rev Assoc Med Bras (1992) ; 69(12): e20230733, 2023.

Article in English | MEDLINE | ID: mdl-37971127

ABSTRACT

OBJECTIVE: Pulmonary thromboembolism is a disease with high morbidity and mortality. Various changes occur on the electrocardiogram secondary to pulmonary thromboembolism. The objective of this study was to investigate variations in QT dispersion, Tpeak-Tend duration, and Tpeak-Tend/QT ratio in relation to pulmonary thromboembolism localization and their impacts on 30-day mortality. METHODS: This study was carried out in a tertiary emergency medicine clinic between December 1, 2019 and November 30, 2020. We evaluated correlations between radiological outcomes of patients, QT dispersions, T-wave dispersions, Tpeak-Tend durations, and Tpeak-Tend/QT ratios. We sought statistically significant disparities between these values, considering the presence or localization of pulmonary thromboembolism. The 30-day mortality in pulmonary thromboembolism-diagnosed patients was reassessed. RESULTS: Electrocardiogramfindings revealed that T-wave dispersion (p<0.001), Tpeak-Tend duration (p=0.034), and Tpeak-Tend/corrected QT ratio (p=0.003) were lower in patients than controls. Conversely, QT dispersion (p=0.005) and corrected QT dispersion (p<0.001) were higher in patients. CONCLUSION: Electrocardiogram findings such as T-wave dispersion, QT duration, Tpeak-Tend time, and Tpeak-Tend/corrected QT ratio can detect pulmonary thromboembolism. More studies with larger cohorts are required to further understand the role of QT and corrected QT dispersion in pulmonary thromboembolism patient mortality.

Subject(s)

Arrhythmias, Cardiac , Pulmonary Embolism , Humans , Electrocardiography , Pulmonary Embolism/diagnosis

Effect of pulmonary embolism location on electrocardiological parameters

Günay, Serkan; Şancı, Emre; Sarı, Ahmet Emir; Gümüş, Semiha Aksoy; Özen, Deniz Kaptan; Halhallı, Hüseyin Cahit.

Rev. Assoc. Med. Bras. (1992, Impr.) ; 69(12): e20230733, 2023. tab, graf

Article in English | LILACS-Express | LILACS | ID: biblio-1521491

ABSTRACT

SUMMARY OBJECTIVE: Pulmonary thromboembolism is a disease with high morbidity and mortality. Various changes occur on the electrocardiogram secondary to pulmonary thromboembolism. The objective of this study was to investigate variations in QT dispersion, Tpeak-Tend duration, and Tpeak-Tend/QT ratio in relation to pulmonary thromboembolism localization and their impacts on 30-day mortality. METHODS: This study was carried out in a tertiary emergency medicine clinic between December 1, 2019 and November 30, 2020. We evaluated correlations between radiological outcomes of patients, QT dispersions, T-wave dispersions, Tpeak-Tend durations, and Tpeak-Tend/QT ratios. We sought statistically significant disparities between these values, considering the presence or localization of pulmonary thromboembolism. The 30-day mortality in pulmonary thromboembolism-diagnosed patients was reassessed. RESULTS: Electrocardiogramfindings revealed that T-wave dispersion (p<0.001), Tpeak-Tend duration (p=0.034), and Tpeak-Tend/corrected QT ratio (p=0.003) were lower in patients than controls. Conversely, QT dispersion (p=0.005) and corrected QT dispersion (p<0.001) were higher in patients. CONCLUSION: Electrocardiogram findings such as T-wave dispersion, QT duration, Tpeak-Tend time, and Tpeak-Tend/corrected QT ratio can detect pulmonary thromboembolism. More studies with larger cohorts are required to further understand the role of QT and corrected QT dispersion in pulmonary thromboembolism patient mortality.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL