RESUMEN
Background: Differential diagnosis in radiology relies on the accurate identification of imaging patterns. The use of large language models (LLMs) in radiology holds promise, with many potential applications that may enhance the efficiency of radiologists' workflow. The study aimed to evaluate the efficacy of generative pre-trained transformer (GPT)-4, a LLM, in providing differential diagnoses in neuroradiology, comparing its performance with board-certified neuroradiologists. Methods: Sixty neuroradiology reports with variable diagnoses were inserted into GPT-4, which was tasked with generating a top-3 differential diagnosis for each case. The results were compared to the true diagnoses and to the differential diagnoses provided by three blinded neuroradiologists. Diagnostic accuracy and agreement between readers were assessed. Results: Of the 60 patients (mean age 47.8 years, 65% female), GPT-4 correctly included the diagnoses in its differentials in 61.7% (37/60) of cases, while the neuroradiologists' accuracy ranged from 63.3% (38/60) to 73.3% (44/60). Agreement between GPT-4 and the neuroradiologists, and among the neuroradiologists was fair to moderate [Cohen's kappa (kw) 0.34-0.44 and kw 0.39-0.54, respectively]. Conclusions: GPT-4 shows potential as a support tool for differential diagnosis in neuroradiology, though it was outperformed by human experts. Radiologists should remain mindful to the limitations of LLMs, while harboring their potential to enhance educational and clinical work.
RESUMEN
PURPOSE: In BRCA germline pathogenic sequence variants (PSV) carriers aged 30-39 years imaging is recommended at six-month intervals. The European society for medical oncology recommendation of the use of 6-monthly MRI six-monthly MRI screening is being considered at our institution, particularly for younger carriers under the age of 35, although it is not mandatory. If 6-monthly MRI is unavailable, annual MRI may be supplemented by ultrasound (with or without mammography). The aim of this study was to evaluate the utility of ultrasound screening added to mammography, as a 6-month supplement to annual MRI in BRCA PSV carriers aged 30-39 years. MATERIALS AND METHODS: This IRB approved retrospective study included BRCA PSV carriers aged 30-39 years, who underwent breast cancer screening at our institution between January 2015 and March 2023. Participants were divided into two groups, those who had supplemental whole-breast US and mammography at six months and underwent screening before March 2019, and those who had only mammography without supplemental US and enrolled in screening after March 2019. Patient characteristics, cancer detection rates and cancer characteristics were compared between the two groups. RESULTS: Overall, 200 asymptomatic BRCA1/2 PSV carriers undergoing screening in our institution were included in the study. Mean age was 35.7 ± 3.5 years, and mean follow-up time was 37.4 ± 38.0 months. There were 118 (59 %) women screened with supplemental US, and 82 (41 %) women without. Eight cancers were diagnosed during the study period, four in women with supplemental US and four in women without. The sensitivity of whole-breast screening US was 25 % (1/4), specificity 85.7 % (222/259), PPV 2.6 % (1/38), and NPV 98.7 % (222/225). Of the four cancers detected in women screened with supplemental US, one was diagnosed by whole-breast US, two by MRI, and one by mammography. Of eight cancers included in this study, two were not detectable by targeted second-look US. All eight cancers were detectable by MRI. CONCLUSION: The addition of whole-breast ultrasound to mammography and MRI screening in BRCA PSV carriers aged 30-39 years offered limited incremental benefit. MRI with 6 months supplemental mammography without US detected all cancer cases.
RESUMEN
OBJECTIVES: This study aims to assess the performance of a multimodal artificial intelligence (AI) model capable of analyzing both images and textual data (GPT-4V), in interpreting radiological images. It focuses on a range of modalities, anatomical regions, and pathologies to explore the potential of zero-shot generative AI in enhancing diagnostic processes in radiology. METHODS: We analyzed 230 anonymized emergency room diagnostic images, consecutively collected over 1 week, using GPT-4V. Modalities included ultrasound (US), computerized tomography (CT), and X-ray images. The interpretations provided by GPT-4V were then compared with those of senior radiologists. This comparison aimed to evaluate the accuracy of GPT-4V in recognizing the imaging modality, anatomical region, and pathology present in the images. RESULTS: GPT-4V identified the imaging modality correctly in 100% of cases (221/221), the anatomical region in 87.1% (189/217), and the pathology in 35.2% (76/216). However, the model's performance varied significantly across different modalities, with anatomical region identification accuracy ranging from 60.9% (39/64) in US images to 97% (98/101) and 100% (52/52) in CT and X-ray images (p < 0.001). Similarly, pathology identification ranged from 9.1% (6/66) in US images to 36.4% (36/99) in CT and 66.7% (34/51) in X-ray images (p < 0.001). These variations indicate inconsistencies in GPT-4V's ability to interpret radiological images accurately. CONCLUSION: While the integration of AI in radiology, exemplified by multimodal GPT-4, offers promising avenues for diagnostic enhancement, the current capabilities of GPT-4V are not yet reliable for interpreting radiological images. This study underscores the necessity for ongoing development to achieve dependable performance in radiology diagnostics. CLINICAL RELEVANCE STATEMENT: Although GPT-4V shows promise in radiological image interpretation, its high diagnostic hallucination rate (> 40%) indicates it cannot be trusted for clinical use as a standalone tool. Improvements are necessary to enhance its reliability and ensure patient safety. KEY POINTS: GPT-4V's capability in analyzing images offers new clinical possibilities in radiology. GPT-4V excels in identifying imaging modalities but demonstrates inconsistent anatomy and pathology detection. Ongoing AI advancements are necessary to enhance diagnostic reliability in radiological applications.
RESUMEN
Large language models (LLMs) are transforming the field of natural language processing (NLP). These models offer opportunities for radiologists to make a meaningful impact in their field. NLP is a part of artificial intelligence (AI) that uses computer algorithms to study and understand text data. Recent advances in NLP include the Attention mechanism and the Transformer architecture. Transformer-based LLMs, such as GPT-4 and Gemini, are trained on massive amounts of data and generate human-like text. They are ideal for analysing large text data in academic research and clinical practice in radiology. Despite their promise, LLMs have limitations, including their dependency on the diversity and quality of their training data and the potential for false outputs. Albeit these limitations, the use of LLMs in radiology holds promise and is gaining momentum. By embracing the potential of LLMs, radiologists can gain valuable insights and improve the efficiency of their work. This can ultimately lead to improved patient care.
RESUMEN
BACKGROUND: Traumatic knee injuries are challenging to diagnose accurately through radiography and to a lesser extent, through CT, with fractures sometimes overlooked. Ancillary signs like joint effusion or lipo-hemarthrosis are indicative of fractures, suggesting the need for further imaging. Artificial Intelligence (AI) can automate image analysis, improving diagnostic accuracy and help prioritizing clinically important X-ray or CT studies. OBJECTIVE: To develop and evaluate an AI algorithm for detecting effusion of any kind in knee X-rays and selected CT images and distinguishing between simple effusion and lipo-hemarthrosis indicative of intra-articular fractures. METHODS: This retrospective study analyzed post traumatic knee imaging from January 2016 to February 2023, categorizing images into lipo-hemarthrosis, simple effusion, or normal. It utilized the FishNet-150 algorithm for image classification, with class activation maps highlighting decision-influential regions. The AI's diagnostic accuracy was validated against a gold standard, based on the evaluations made by a radiologist with at least four years of experience. RESULTS: Analysis included CT images from 515 patients and X-rays from 637 post traumatic patients, identifying lipo-hemarthrosis, simple effusion, and normal findings. The AI showed an AUC of 0.81 for detecting any effusion, 0.78 for simple effusion, and 0.83 for lipo-hemarthrosis in X-rays; and 0.89, 0.89, and 0.91, respectively, in CTs. CONCLUSION: The AI algorithm effectively detects knee effusion and differentiates between simple effusion and lipo-hemarthrosis in post-traumatic patients for both X-rays and selected CT images further studies are needed to validate these results.
Asunto(s)
Inteligencia Artificial , Hemartrosis , Traumatismos de la Rodilla , Tomografía Computarizada por Rayos X , Humanos , Traumatismos de la Rodilla/diagnóstico por imagen , Traumatismos de la Rodilla/complicaciones , Tomografía Computarizada por Rayos X/métodos , Femenino , Masculino , Estudios Retrospectivos , Hemartrosis/diagnóstico por imagen , Hemartrosis/etiología , Persona de Mediana Edad , Adulto , Algoritmos , Anciano , Exudados y Transudados/diagnóstico por imagen , Anciano de 80 o más Años , Adulto Joven , Adolescente , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Articulación de la Rodilla/diagnóstico por imagen , Sensibilidad y EspecificidadRESUMEN
Introduction: The field of vestibular science, encompassing the study of the vestibular system and associated disorders, has experienced notable growth and evolving trends over the past five decades. Here, we explore the changing landscape in vestibular science, focusing on epidemiology, peripheral pathologies, diagnosis methods, treatment, and technological advancements. Methods: Publication data was obtained from the US National Center for Biotechnology Information (NCBI) PubMed database. The analysis included epidemiological, etiological, diagnostic, and treatment-focused studies on peripheral vestibular disorders, with a particular emphasis on changes in topics and trends of publications over time. Results: Our dataset of 39,238 publications revealed a rising trend in research across all age groups. Etiologically, benign paroxysmal positional vertigo (BPPV) and Meniere's disease were the most researched conditions, but the prevalence of studies on vestibular migraine showed a marked increase in recent years. Electronystagmography (ENG)/ Videonystagmography (VNG) and Vestibular Evoked Myogenic Potential (VEMP) were the most commonly discussed diagnostic tools, while physiotherapy stood out as the primary treatment modality. Conclusion: Our study presents a unique opportunity and point of view, exploring the evolving landscape of vestibular science publications over the past five decades. The analysis underscored the dynamic nature of the field, highlighting shifts in focus and emerging publication trends in diagnosis and treatment over time.
RESUMEN
PURPOSE: The purpose of this study is to evaluate the efficacy of an artificial intelligence (AI) model designed to identify active bleeding in digital subtraction angiography images for upper gastrointestinal bleeding. METHODS: Angiographic images were retrospectively collected from mesenteric and celiac artery embolization procedures performed between 2018 and 2022. This dataset included images showing both active bleeding and non-bleeding phases from the same patients. The images were labeled as normal versus images that contain active bleeding. A convolutional neural network was trained and validated to automatically classify the images. Algorithm performance was tested in terms of area under the curve, accuracy, sensitivity, specificity, F1 score, positive and negative predictive value. RESULTS: The dataset included 587 pre-labeled images from 142 patients. Of these, 302 were labeled as normal angiogram and 285 as containing active bleeding. The model's performance on the validation cohort was area under the curve 85.0 ± 10.9% (standard deviation) and average classification accuracy 77.43 ± 4.9%. For Youden's index cutoff, sensitivity and specificity were 85.4 ± 9.4% and 81.2 ± 8.6%, respectively. CONCLUSION: In this study, we explored the application of AI in mesenteric and celiac artery angiography for detecting active bleeding. The results of this study show the potential of an AI-based algorithm to accurately classify images with active bleeding. Further studies using a larger dataset are needed to improve accuracy and allow segmentation of the bleeding.
Asunto(s)
Angiografía de Substracción Digital , Inteligencia Artificial , Arteria Celíaca , Hemorragia Gastrointestinal , Arterias Mesentéricas , Humanos , Arteria Celíaca/diagnóstico por imagen , Estudios Retrospectivos , Hemorragia Gastrointestinal/diagnóstico por imagen , Hemorragia Gastrointestinal/terapia , Angiografía de Substracción Digital/métodos , Masculino , Femenino , Persona de Mediana Edad , Arterias Mesentéricas/diagnóstico por imagen , Anciano , Sensibilidad y Especificidad , Embolización Terapéutica/métodos , Algoritmos , Adulto , Interpretación de Imagen Radiográfica Asistida por Computador/métodosRESUMEN
BACKGROUND AND AIMS: Artificial Intelligence (AI) models like GPT-3.5 and GPT-4 have shown promise across various domains but remain underexplored in healthcare. Emergency Departments (ED) rely on established scoring systems, such as NIHSS and HEART score, to guide clinical decision-making. This study aims to evaluate the proficiency of GPT-3.5 and GPT-4 against experienced ED physicians in calculating five commonly used medical scores. METHODS: This retrospective study analyzed data from 150 patients who visited the ED over one week. Both AI models and two human physicians were tasked with calculating scores for NIH Stroke Scale, Canadian Syncope Risk Score, Alvarado Score for Acute Appendicitis, Canadian CT Head Rule, and HEART Score. Cohen's Kappa statistic and AUC values were used to assess inter-rater agreement and predictive performance, respectively. RESULTS: The highest level of agreement was observed between the human physicians (Kappa = 0.681), while GPT-4 also showed moderate to substantial agreement with them (Kappa values of 0.473 and 0.576). GPT-3.5 had the lowest agreement with human scorers. These results highlight the superior predictive performance of human expertise over the currently available automated systems for this specific medical outcome. Human physicians achieved a higher ROC-AUC on 3 of the 5 scores, but none of the differences were statistically significant. CONCLUSIONS: While AI models demonstrated some level of concordance with human expertise, they fell short in emulating the complex clinical judgments that physicians make. The study suggests that current AI models may serve as supplementary tools but are not ready to replace human expertise in high-stakes settings like the ED. Further research is needed to explore the capabilities and limitations of AI in emergency medicine.
Asunto(s)
Inteligencia Artificial , Médicos , Humanos , Canadá , Estudios Retrospectivos , Servicio de Urgencia en HospitalRESUMEN
PURPOSE: Despite advanced technologies in breast cancer management, challenges remain in efficiently interpreting vast clinical data for patient-specific insights. We reviewed the literature on how large language models (LLMs) such as ChatGPT might offer solutions in this field. METHODS: We searched MEDLINE for relevant studies published before December 22, 2023. Keywords included: "large language models", "LLM", "GPT", "ChatGPT", "OpenAI", and "breast". The risk bias was evaluated using the QUADAS-2 tool. RESULTS: Six studies evaluating either ChatGPT-3.5 or GPT-4, met our inclusion criteria. They explored clinical notes analysis, guideline-based question-answering, and patient management recommendations. Accuracy varied between studies, ranging from 50 to 98%. Higher accuracy was seen in structured tasks like information retrieval. Half of the studies used real patient data, adding practical clinical value. Challenges included inconsistent accuracy, dependency on the way questions are posed (prompt-dependency), and in some cases, missing critical clinical information. CONCLUSION: LLMs hold potential in breast cancer care, especially in textual information extraction and guideline-driven clinical question-answering. Yet, their inconsistent accuracy underscores the need for careful validation of these models, and the importance of ongoing supervision.
Asunto(s)
Inteligencia Artificial , Neoplasias de la Mama , Femenino , Humanos , Neoplasias de la Mama/terapiaRESUMEN
BACKGROUND: Advancements in artificial intelligence (AI) and natural language processing (NLP) have led to the development of language models such as ChatGPT. These models have the potential to transform healthcare and medical research. However, understanding their applications and limitations is essential. OBJECTIVES: To present a view of ChatGPT research and to critically assess ChatGPT's role in medical writing and clinical environments. METHODS: We performed a literature review via the PubMed search engine from 20 November 2022, to 23 April 2023. The search terms included ChatGPT, OpenAI, and large language models. We included studies that focused on ChatGPT, explored its use or implications in medicine, and were original research articles. The selected studies were analyzed considering study design, NLP tasks, main findings, and limitations. RESULTS: Our study included 27 articles that examined ChatGPT's performance in various tasks and medical fields. These studies covered knowledge assessment, writing, and analysis tasks. While ChatGPT was found to be useful in tasks such as generating research ideas, aiding clinical reasoning, and streamlining workflows, limitations were also identified. These limitations included inaccuracies, inconsistencies, fictitious information, and limited knowledge, highlighting the need for further improvements. CONCLUSIONS: The review underscores ChatGPT's potential in various medical applications. Yet, it also points to limitations that require careful human oversight and responsible use to improve patient care, education, and decision-making.
Asunto(s)
Inteligencia Artificial , Medicina , Humanos , Escolaridad , Lenguaje , Atención a la SaludRESUMEN
This study's aim is to describe the imaging findings in pregnant patients undergoing emergent MRI for suspected acute appendicitis, and the various alternative diagnoses seen on those MRI scans. This is a single center retrospective analysis in which we assessed the imaging, clinical and pathological data for all consecutive pregnant patients who underwent emergent MRI for suspected acute appendicitis between April 2013 and June 2021. Out of 167 patients, 35 patients (20.9%) were diagnosed with acute appendicitis on MRI. Thirty patients (18%) were diagnosed with an alternative diagnosis on MRI: 17/30 (56.7%) patients had a gynecological source of abdominal pain (e.g. ectopic pregnancy, red degeneration of a leiomyoma); 8 patients (26.7%) had urological findings such as pyelonephritis; and 6 patients (20%) had gastrointestinal diagnoses (e.g. abdominal wall hernia or inflammatory bowel disease). Our conclusions are that MRI is a good diagnostic tool in the pregnant patient, not only in diagnosing acute appendicitis, but also in providing information on alternative diagnoses to acute abdominal pain. Our findings show the various differential diagnoses on emergent MRI in pregnant patients with suspected acute appendicitis, which may assist clinicians and radiologists is patient assessment and imaging utilization.
Asunto(s)
Apendicitis , Complicaciones del Embarazo , Embarazo , Femenino , Humanos , Apendicitis/diagnóstico por imagen , Estudios Retrospectivos , Complicaciones del Embarazo/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Dolor Abdominal/diagnóstico por imagen , Diagnóstico Diferencial , Enfermedad Aguda , Sensibilidad y EspecificidadRESUMEN
Background: The impaired drainage of cerebrospinal fluid through the glymphatic system is thought to play a role in the idiopathic intracranial hypertension (IIH) pathophysiology. Limited data exist regarding the glymphatic system's involvement in pediatric patients with IIH. Therefore, the study's objective was to quantitatively evaluate alterations in parenchymal diffusivity and magnetic resonance imaging (MRI)-visible dilated perivascular spaces (PVS) as imaging indicators of glymphatic dysfunction in pediatric patients with IIH. Methods: Patients diagnosed with IIH in 2017-2022 in a single tertiary center (Sheba Medical Center, Israel) were retrospectively reviewed. Twenty-four pediatric patients were enrolled. All patients underwent clinical 3-T brain MRI. The control group included 24 age- and gender-matched healthy subjects with a normal-appearing brain on imaging. We used automatic atlas-based diffusion-weighted imaging analysis to determine regional diffusivity of the thalamus, caudate, putamen, globus pallidus, hippocampus, amygdala, and brain stem. PVS were evaluated using a semi-quantitative rating scale on T2-weighted images. Variables were compared using the Mann-Whitney test. Multivariate analysis of covariance was used to test for differences between controls and IIH patients. Results: No significant differences in regional brain diffusivity were observed between individuals with IIH and healthy controls (P=0.14-0.91 for various brain regions). The number of visible PVS was comparable between patients with IIH and the control group across all evaluated sites (P=0.12-0.74 for various brain regions). Conclusions: Pediatric IIH patients exhibited similar patterns of parenchymal diffusivity and PVS compared to age-matched controls. These findings do not support the hypothesis that the glymphatic system may play a role in the pathophysiology of pediatric IIH, although previously postulated. However, employing more sophisticated magnetic resonance (MR) techniques could enhance the sensitivity in uncovering underlying glymphatic dysfunction. Further research is warranted to validate and explore this association in larger cohorts and investigate the underlying mechanisms involved in IIH.
RESUMEN
PURPOSE: To describe a single-center experience in the treatment of chronic limb-threatening ischemia (CLTI) with the application of BeBack catheter (Bentley InnoMed, Germany) in patients with arterial chronic total occlusion (CTO). MATERIALS AND METHODS: A retrospective review of patients who underwent limb revascularizations using the BeBack catheter between 2015 and 2022. All patients had an initial failed attempt using a traditional guidewire and catheter technique. Technical success was considered whenever a successful re-entry or lesion crossing using the study device was achieved. Procedural success was defined as recanalization of the occluded artery with residual stenosis of less than 30%, and improvement in ankle-brachial index (ABI) after 24 hours. A Rutherford score was assigned to each limb and affected anatomical segments and lesion length were documented. Procedural access sites and complications were noted. RESULTS: The study included 72 patients who underwent 78 procedures using the BeBack crossing catheter. Procedural success was achieved in 91% of cases, with a technical success rate of 92.3%. The most frequently involved occluded segments were the femoral and popliteal arteries. The average ABI improved from 0.59 to 0.95 after the procedure. The most used access site was the contralateral femoral, and the BeBack catheter was employed on 85 occasions. Only 1 patient suffered a severe immediate adverse effect, and during the 30-day follow-up period, 2 patients needed reintervention. Unfortunately, 3 patients died during the follow-up period. CONCLUSION: The BeBack catheter offers a viable option for the treatment of patients with chronic total occlusion, with high procedural success and a low complication rate. CLINICAL IMPACT: The BeBack catheter presents a notable advancement for clinicians managing chronic limb-threatening ischemia (CLTI) and arterial chronic total occlusion (CTO), showcasing over 90% procedural and technical success rates in this study. Its adept ability to navigate and recanalize occluded segments provides a robust alternative, especially when traditional techniques falter. This innovation may chane clinical strategies in vascular interventions, offering an efficient and reliable option, thereby potentially enhancing patient outcomes in limb revascularizations.
RESUMEN
PURPOSE: The growing application of deep learning in radiology has raised concerns about cybersecurity, particularly in relation to adversarial attacks. This study aims to systematically review the literature on adversarial attacks in radiology. METHODS: We searched for studies on adversarial attacks in radiology published up to April 2023, using MEDLINE and Google Scholar databases. RESULTS: A total of 22 studies published between March 2018 and April 2023 were included, primarily focused on image classification algorithms. Fourteen studies evaluated white-box attacks, three assessed black-box attacks and five investigated both. Eleven of the 22 studies targeted chest X-ray classification algorithms, while others involved chest CT (6/22), brain MRI (4/22), mammography (2/22), abdominal CT (1/22), hepatic US (1/22), and thyroid US (1/22). Some attacks proved highly effective, reducing the AUC of algorithm performance to 0 and achieving success rates up to 100 %. CONCLUSIONS: Adversarial attacks are a growing concern. Although currently the threats are more theoretical than practical, they still represent a potential risk. It is important to be alert to such attacks, reinforce cybersecurity measures, and influence the formulation of ethical and legal guidelines. This will ensure the safe use of deep learning technology in medicine.
Asunto(s)
Radiología , Humanos , Radiografía , Mamografía , Tomografía Computarizada por Rayos X , AlgoritmosRESUMEN
This study explores the potential of OpenAI's ChatGPT as a decision support tool for acute ulcerative colitis presentations in the setting of an emergency department. We assessed ChatGPT's performance in determining disease severity using TrueLove and Witts criteria and the necessity of hospitalization for patients with ulcerative colitis, comparing results with those of expert gastroenterologists. Of 20 cases, ChatGPT's assessments were found to be 80% consistent with gastroenterologist evaluations and indicated a high degree of reliability. This suggests that ChatGPT could provide as a clinical decision support tool in assessing acute ulcerative colitis, serving as an adjunct to clinical judgment.
Asunto(s)
Colitis Ulcerosa , Humanos , Colitis Ulcerosa/diagnóstico , Reproducibilidad de los Resultados , Toma de Decisiones Clínicas , Servicio de Urgencia en Hospital , Inteligencia ArtificialRESUMEN
PURPOSE: To evaluate tibial single access in treatment of chronic total occlusions (CTO) in patients with ipsilateral chronic-limb ischemia (CLTI). MATERIALS AND METHODS: In this retrospective study, data was collected on patients treated for ipsilateral CTO via a tibial artery access between March 2017 and March 2021. Fifty-nine limbs in 57 patients, (42 men, average age 73 years; range 47-96) were treated. Patient's symptoms were classified in accordance with the Rutherford category. The end points were freedom from major amputation and the need for reintervention up to 1 year of follow up. RESULTS: Out of the 59 treated limbs, technical success was achieved in 57 (97%). The treated multilevel segments involved 5 common and 12 external iliac arteries, 23 common and 37 superficial femoral arteries, 23 femoropopliteal segments, 14 popliteal arteries, and 4 bypasses. Mean length of occlusion was 186 mm (range 7-670). Rutherford classification of the treated limbs was category 5 and 6 in 45 patients and category 4 in 14 patients. Three procedural complications occurred and were successfully treated during the same procedure. No immediate post-procedural complication was encountered. Median follow-up was 13 months (range 1-45.3). Reintervention was required in 9 limbs, after an average of 6 months. One year free from amputation rate was 91.2%. CONCLUSIONS: Single access via the ipsilateral tibial artery can be a useful, effective, and safe approach for treating CTO in CLTI patients.
RESUMEN
PURPOSE: The quality of radiology referrals influences patient management and imaging interpretation by radiologists. The aim of this study was to evaluate ChatGPT-4 as a decision support tool for selecting imaging examinations and generating radiology referrals in the emergency department (ED). METHODS: Five consecutive clinical notes from the ED were retrospectively extracted, for each of the following pathologies: pulmonary embolism, obstructing kidney stones, acute appendicitis, diverticulitis, small bowel obstruction, acute cholecystitis, acute hip fracture, and testicular torsion. A total of 40 cases were included. These notes were entered into ChatGPT-4, requesting recommendations on the most appropriate imaging examinations and protocols. The chatbot was also asked to generate radiology referrals. Two independent radiologists graded the referral on a scale ranging from 1 to 5 for clarity, clinical relevance, and differential diagnosis. The chatbot's imaging recommendations were compared with the ACR Appropriateness Criteria (AC) and with the examinations performed in the ED. Agreement between readers was assessed using linear weighted Cohen's κ coefficient. RESULTS: ChatGPT-4's imaging recommendations aligned with the ACR AC and ED examinations in all cases. Protocol discrepancies between ChatGPT and the ACR AC were observed in two cases (5%). ChatGPT-4-generated referrals received mean scores of 4.6 and 4.8 for clarity, 4.5 and 4.4 for clinical relevance, and 4.9 from both reviewers for differential diagnosis. Agreement between readers was moderate for clinical relevance and clarity and substantial for differential diagnosis grading. CONCLUSIONS: ChatGPT-4 has shown potential in aiding imaging study selection for select clinical cases. As a complementary tool, large language models may improve radiology referral quality. Radiologists should stay informed about this technology and be mindful of potential challenges and risks.
Asunto(s)
Fracturas de Cadera , Radiología , Humanos , Estudios Retrospectivos , Radiografía , Servicio de Urgencia en HospitalRESUMEN
PURPOSE: Abnormal fetal brain measurements might affect clinical management and parental counseling. The effect of between-field-strength differences was not evaluated in quantitative fetal brain imaging until now. Our study aimed to compare fetal brain biometry measurements in 3.0 T with 1.5 T scanners. METHODS: A retrospective cohort of 1150 low-risk fetuses scanned between 2012 and 2021, with apparently normal brain anatomy, were retrospectively evaluated for biometric measurements. The cohort included 1.5 T (442 fetuses) and 3.0 T scans (708 fetuses) of populations with comparable characteristics in the same tertiary medical center. Manually measured biometry included bi-parietal, fronto-occipital and trans-cerebellar diameters, length of the corpus-callosum, vermis height, and width. Measurements were then converted to centiles based on previously reported biometric reference charts. The 1.5 T centiles were compared with the 3.0 T centiles. RESULTS: No significant differences between centiles of bi-parietal diameter, trans-cerebellar diameter, or length of the corpus callosum between 1.5 T and 3.0 T scanners were found. Small absolute differences were found in the vermis height, with higher centiles in the 3.0 T, compared to the 1.5 T scanner (54.6th-centile, vs. 39.0th-centile, p < 0.001); less significant differences were found in vermis width centiles (46.9th-centile vs. 37.5th-centile, p = 0.03). Fronto-occipital diameter was higher in 1.5 T than in the 3.0 T scanner (66.0th-centile vs. 61.8th-centile, p = 0.02). CONCLUSIONS: The increasing use of 3.0 T MRI for fetal imaging poses a potential bias when using 1.5 T-based charts. We elucidate those biometric measurements are comparable, with relatively small between-field-strength differences, when using manual biometric measurements. Small inter-magnet differences can be related to higher spatial resolution with 3 T scanners and may be substantial when evaluating small brain structures, such as the vermis.
Asunto(s)
Imagen por Resonancia Magnética , Imanes , Femenino , Humanos , Estudios Retrospectivos , Estudios de Cohortes , Imagen por Resonancia Magnética/métodos , Encéfalo/diagnóstico por imagen , Encéfalo/anatomía & histología , Biometría/métodosRESUMEN
Large language models such as ChatGPT have gained public and scientific attention. These models may support oncologists in their work. Oncologists should be familiar with large language models to harness their potential while being aware of potential dangers and limitations.