Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros












Intervalo de año de publicación
1.
Diagn Interv Radiol ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39248152

RESUMEN

PURPOSE: This study aimed to evaluate the performance of large language models (LLMs) and multimodal LLMs in interpreting the Breast Imaging Reporting and Data System (BI-RADS) categories and providing clinical management recommendations for breast radiology in text-based and visual questions. METHODS: This cross-sectional observational study involved two steps. In the first step, we compared ten LLMs (namely ChatGPT 4o, ChatGPT 4, ChatGPT 3.5, Google Gemini 1.5 Pro, Google Gemini 1.0, Microsoft Copilot, Perplexity, Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Opus 200K), general radiologists, and a breast radiologist using 100 text-based multiple-choice questions (MCQs) related to the BI-RADS Atlas 5th edition. In the second step, we assessed the performance of five multimodal LLMs (ChatGPT 4o, ChatGPT 4V, Claude 3.5 Sonnet, Claude 3 Opus, and Google Gemini 1.5 Pro) in assigning BI-RADS categories and providing clinical management recommendations on 100 breast ultrasound images. The comparison of correct answers and accuracy by question types was analyzed using McNemar's and chi-squared tests. Management scores were analyzed using the Kruskal- Wallis and Wilcoxon tests. RESULTS: Claude 3.5 Sonnet achieved the highest accuracy in text-based MCQs (90%), followed by ChatGPT 4o (89%), outperforming all other LLMs and general radiologists (78% and 76%) (P < 0.05), except for the Claude 3 Opus models and the breast radiologist (82%) (P > 0.05). Lower-performing LLMs included Google Gemini 1.0 (61%) and ChatGPT 3.5 (60%). Performance across different categories of showed no significant variation among LLMs or radiologists (P > 0.05). For breast ultrasound images, Claude 3.5 Sonnet achieved 59% accuracy, significantly higher than other multimodal LLMs (P < 0.05). Management recommendations were evaluated using a 3-point Likert scale, with Claude 3.5 Sonnet scoring the highest (mean: 2.12 ± 0.97) (P < 0.05). Accuracy varied significantly across BI-RADS categories, except Claude 3 Opus (P < 0.05). Gemini 1.5 Pro failed to answer any BI-RADS 5 questions correctly. Similarly, ChatGPT 4V failed to answer any BI-RADS 1 questions correctly, making them the least accurate in these categories (P < 0.05). CONCLUSION: Although LLMs such as Claude 3.5 Sonnet and ChatGPT 4o show promise in text-based BI-RADS assessments, their limitations in visual diagnostics suggest they should be used cautiously and under radiologists' supervision to avoid misdiagnoses. CLINICAL SIGNIFICANCE: This study demonstrates that while LLMs exhibit strong capabilities in text-based BI-RADS assessments, their visual diagnostic abilities are currently limited, necessitating further development and cautious application in clinical practice.

2.
Clin Imaging ; 114: 110271, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39236553

RESUMEN

The advent of large language models (LLMs) marks a transformative leap in natural language processing, offering unprecedented potential in radiology, particularly in enhancing the accuracy and efficiency of coronary artery disease (CAD) diagnosis. While previous studies have explored the capabilities of specific LLMs like ChatGPT in cardiac imaging, a comprehensive evaluation comparing multiple LLMs in the context of CAD-RADS 2.0 has been lacking. This study addresses this gap by assessing the performance of various LLMs, including ChatGPT 4, ChatGPT 4o, Claude 3 Opus, Gemini 1.5 Pro, Mistral Large, Meta Llama 3 70B, and Perplexity Pro, in answering 30 multiple-choice questions derived from the CAD-RADS 2.0 guidelines. Our findings reveal that ChatGPT 4o achieved the highest accuracy at 100 %, with ChatGPT 4 and Claude 3 Opus closely following at 96.6 %. Other models, including Mistral Large, Perplexity Pro, Meta Llama 3 70B, and Gemini 1.5 Pro, also demonstrated commendable performance, though with slightly lower accuracy ranging from 90 % to 93.3 %. This study underscores the proficiency of current LLMs in understanding and applying CAD-RADS 2.0, suggesting their potential to significantly enhance radiological reporting and patient care in coronary artery disease. The variations in model performance highlight the need for further research, particularly in evaluating the visual diagnostic capabilities of LLMs-a critical component of radiology practice. This study provides a foundational comparison of LLMs in CAD-RADS 2.0 and sets the stage for future investigations into their broader applications in radiology, emphasizing the importance of integrating both text-based and visual knowledge for optimal clinical outcomes.


Asunto(s)
Angiografía por Tomografía Computarizada , Angiografía Coronaria , Enfermedad de la Arteria Coronaria , Procesamiento de Lenguaje Natural , Humanos , Angiografía por Tomografía Computarizada/métodos , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Angiografía Coronaria/métodos , Reproducibilidad de los Resultados
4.
J Thorac Imaging ; 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39269227

RESUMEN

PURPOSE: To investigate and compare the diagnostic performance of 10 different large language models (LLMs) and 2 board-certified general radiologists in thoracic radiology cases published by The Society of Thoracic Radiology. MATERIALS AND METHODS: We collected publicly available 124 "Case of the Month" from the Society of Thoracic Radiology website between March 2012 and December 2023. Medical history and imaging findings were input into LLMs for diagnosis and differential diagnosis, while radiologists independently visually provided their assessments. Cases were categorized anatomically (parenchyma, airways, mediastinum-pleura-chest wall, and vascular) and further classified as specific or nonspecific for radiologic diagnosis. Diagnostic accuracy and differential diagnosis scores (DDxScore) were analyzed using the χ2, Kruskal-Wallis, Wilcoxon, McNemar, and Mann-Whitney U tests. RESULTS: Among the 124 cases, Claude 3 Opus showed the highest diagnostic accuracy (70.29%), followed by ChatGPT 4/Google Gemini 1.5 Pro (59.75%), Meta Llama 3 70b (57.3%), ChatGPT 3.5 (53.2%), outperforming radiologists (52.4% and 41.1%) and other LLMs (P<0.05). Claude 3 Opus DDxScore was significantly better than other LLMs and radiologists, except ChatGPT 3.5 (P<0.05). All LLMs and radiologists showed greater accuracy in specific cases (P<0.05), with no DDxScore difference for Perplexity and Google Bard based on specificity (P>0.05). There were no significant differences between LLMs and radiologists in the diagnostic accuracy of anatomic subgroups (P>0.05), except for Meta Llama 3 70b in the vascular cases (P=0.040). CONCLUSIONS: Claude 3 Opus outperformed other LLMs and radiologists in text-based thoracic radiology cases. LLMs hold great promise for clinical decision systems under proper medical supervision.

5.
JCO Glob Oncol ; 10: e2400200, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39208360

RESUMEN

This study evaluates LLM integration in interpreting Lung-RADS for lung cancer screening, highlighting their innovative role in enhancing radiological practice. Our findings reveal that Claude 3 Opus and Perplexity achieved a 96% accuracy rate, outperforming other models.


Asunto(s)
Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Detección Precoz del Cáncer/métodos
6.
Rofo ; 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38977011

RESUMEN

Research on magnetic resonance enterography (MRE) and sarcopenia for assessing Crohn's disease (CD) is growing. Our study examined the connections between the presence of sarcopenia, intramural fat accumulation (IFA), and clinical, laboratory, and MRE findings.This retrospective study was conducted on 112 patients with suspected or diagnosed CD who underwent 3-tesla MRE. The study examined the correlation between sarcopenia-related parameters and MRE findings. Results of MRE exams and clinical and laboratory results were statistically analyzed. The Kruskal-Wallis, Pearson chi-square, and Fisher-Freeman-Halton tests were used for comparison.It was determined that patients with active inflammation on a chronic basis had more IFA than the others (p<0.001). There were positive relationships between IFA and intramural edema (p<0.001). There were positive correlations between IFA and high b-values and negative correlations with apparent diffusion coefficient values (p<0.05). Positively significant relationships were found between IFA and wall thickness, affected segment length, disease duration, and sedimentation values (p<0.05). Strong correlations were found between sarcopenia and the CD activity index as well as wall thickness (p<0.001/p=0.003). There was no significant relationship between steroid usage and other variables.The presence of IFA is associated with chronic inflammation. There was no clear relationship between steroid use and IFA. Our findings support the idea that sarcopenia is related to the activity of CD. Further comprehensive research is required on these subjects. · The usage of MR enterography for the management of CD is increasing day by day due to its advantages.. · There is a paucity of evidence regarding the relationship between sarcopenia and MR enterography findings in patients with CD.. · Intramural fat accumulation (IFA) is a sign of chronicity in patients with CD.. · The presence of IFA seems to be associated with active inflammation on a chronic basis.. · There was no clear relationship between steroid use and IFA.. · Algin O, Günes YC, Cankurtaran RE et al. The Relationship Between Intramural Fat Accumulation and Sarcopenia on MR Enterography Exams in Patients with Crohn's Disease. Fortschr Röntgenstr 2024; DOI 10.1055/a-2330-8148.

8.
J Stroke Cerebrovasc Dis ; 33(11): 107897, 2024 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-39069148

RESUMEN

INTRODUCTION: The Woven EndoBridge (WEB) device is emerging as a novel therapy for intracranial aneurysms, but its use for off-label indications requires further study. Using machine learning, we aimed to develop predictive models for complete occlusion after off-label WEB treatment and to identify factors associated with occlusion outcomes. METHODS: This multicenter, retrospective study included 162 patients who underwent off-label WEB treatment for intracranial aneurysms. Baseline, morphological, and procedural variables were utilized to develop machine-learning models predicting complete occlusion. Model interpretation was performed to determine significant predictors. Ordinal regression was also performed with occlusion status as an ordinal outcome from better (Raymond Roy Occlusion Classification [RROC] grade 1) to worse (RROC grade 3) status. Odds ratios (OR) with 95 % confidence intervals (CI) were reported. RESULTS: The best performing model achieved an AUROC of 0.8 for predicting complete occlusion. Larger neck diameter and daughter sac were significant independent predictors of incomplete occlusion. On multivariable ordinal regression, higher RROC grades (OR 1.86, 95 % CI 1.25-2.82), larger neck diameter (OR 1.69, 95 % CI 1.09-2.65), and presence of daughter sacs (OR 2.26, 95 % CI 0.99-5.15) were associated with worse aneurysm occlusion after WEB treatment, independent of other factors. CONCLUSION: This study found that larger neck diameter and daughter sacs were associated with worse occlusion after WEB therapy for aneurysms. The machine learning approach identified anatomical factors related to occlusion outcomes that may help guide patient selection and monitoring with this technology. Further validation is needed.

11.
Abdom Radiol (NY) ; 49(10): 3758, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38913138
13.
Cureus ; 16(5): e60009, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38854352

RESUMEN

Background Recent studies have highlighted the diagnostic performance of ChatGPT 3.5 and GPT-4 in a text-based format, demonstrating their radiological knowledge across different areas. Our objective is to investigate the impact of prompt engineering on the diagnostic performance of ChatGPT 3.5 and GPT-4 in diagnosing thoracic radiology cases, highlighting how the complexity of prompts influences model performance. Methodology We conducted a retrospective cross-sectional study using 124 publicly available Case of the Month examples from the Thoracic Society of Radiology website. We initially input the cases into the ChatGPT versions without prompting. Then, we employed five different prompts, ranging from basic task-oriented to complex role-specific formulations to measure the diagnostic accuracy of ChatGPT versions. The differential diagnosis lists generated by the models were compared against the radiological diagnoses listed on the Thoracic Society of Radiology website, with a scoring system in place to comprehensively assess the accuracy. Diagnostic accuracy and differential diagnosis scores were analyzed using the McNemar, Chi-square, Kruskal-Wallis, and Mann-Whitney U tests. Results Without any prompts, ChatGPT 3.5's accuracy was 25% (31/124), which increased to 56.5% (70/124) with the most complex prompt (P < 0.001). GPT-4 showed a high baseline accuracy at 53.2% (66/124) without prompting. This accuracy increased to 59.7% (74/124) with complex prompts (P = 0.09). Notably, there was no statistical difference in peak performance between ChatGPT 3.5 (70/124) and GPT-4 (74/124) (P = 0.55). Conclusions This study emphasizes the critical influence of prompt engineering on enhancing the diagnostic performance of ChatGPT versions, especially ChatGPT 3.5.

16.
Acta Radiol ; 65(6): 601-608, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38644747

RESUMEN

BACKGROUND: Epicardial adipose tissue (EAT) volume is usually measured with ECG-gated computed tomography (CT). Measurement of EAT thickness is a more convenient method; however, it is not clear whether EAT thickness measured with non-gated CT is reliable and at which localization it agrees best with the EAT volume. PURPOSE: To examine the agreement between ECG-gated EAT volume and non-gated EAT thickness measured from various localizations and to assess the predictive role of EAT thickness for high EAT volume. MATERIAL AND METHODS: EAT thickness was measured at six locations using non-contrast thorax CT and EAT volume was measured using ECG-gated cardiac CT (n = 68). The correlation and agreement (Bland-Altman plots) between the thicknesses and EAT volume were assessed. RESULTS: EAT thicknesses were significantly correlated with EAT volume (P < 0.001). The highest correlation (r = 0.860) and agreement were observed for the thickness adjacent to the right ventricular free wall. Also, EAT thickness at this location has a strong potential for discriminating high (>125 cm3) EAT volume (area under the ROC curve=0.889, 95% CI=0.801-0.977; P < 0.001). The sensitivity, specificity, and positive and negative predictive values of EAT thickness for high EAT volume were 76.5%, 88.2%, 68.4%, and 91.8%, respectively, for the cutoff value of 5.75 cm; and 47.1%, 100%, 100%, and 85%, respectively, for the cutoff value of 8.10 cm. CONCLUSION: EAT thickness measured on non-gated chest CT adjacent to the right ventricular free wall is a reliable and easy-to-use alternative to the volumetric quantification and has a strong potential to predict high EAT volume.


Asunto(s)
Tejido Adiposo , Pericardio , Radiografía Torácica , Tomografía Computarizada por Rayos X , Humanos , Tejido Adiposo/diagnóstico por imagen , Pericardio/diagnóstico por imagen , Masculino , Femenino , Tomografía Computarizada por Rayos X/métodos , Persona de Mediana Edad , Radiografía Torácica/métodos , Anciano , Adulto , Reproducibilidad de los Resultados , Anciano de 80 o más Años , Tejido Adiposo Epicárdico
20.
Surg Endosc ; 38(4): 1807-1812, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38291160

RESUMEN

BACKGROUND: Bariatric surgery has significant effects on metabolic parameters and hormone levels. However, the specific impact of laparoscopic sleeve gastrectomy (LSG) on thyroid hormones and other metabolic parameters remains unclear. This study aimed to investigate the short and long-term effects of LSG on thyroid hormone levels, HbA1c, and other metabolic parameters. METHODS: A total of 619 euthyroid patients without a history of thyroid disease or thyroid hormone replacement therapy were included in the study. Patients with diabetes were excluded from the study. Preoperative, 1-year postoperative, and 5-year postoperative levels of thyroid-stimulating hormone (TSH), free triiodothyronine (fT3), free thyroxine (fT4), HbA1c, and other metabolic parameters were recorded and analyzed. RESULTS: LSG resulted in significant weight loss and improvements in metabolic parameters. At 1 year postoperatively, there were significant reductions in BMI, HbA1c, TSH, fT3, and triglyceride levels, while fT4 levels increased. A statistically significant negative correlation was found between preoperative HbA1c level and percentage of total weight loss (%TWL) value at the fifth postoperative year. Additionally, a statistically significant negative correlation was found between the 5-year change in TSH and %TWL. CONCLUSION: Being the first study to predict long-term total weight loss based on preoperative HbA1c, it is significant. This finding has important implications for personalized patient management and could aid clinicians in identifying individuals who may benefit most from sleeve gastrectomy as a treatment modality. This is valuable in that it emphasizes multidisciplinary work, including the endocrinologist and dietician.


Asunto(s)
Laparoscopía , Obesidad Mórbida , Humanos , Tiroxina , Obesidad Mórbida/cirugía , Hemoglobina Glucada , Hormonas Tiroideas , Tirotropina , Gastrectomía/métodos , Pérdida de Peso , Estudios Retrospectivos , Índice de Masa Corporal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...