RESUMEN
RATIONALE AND OBJECTIVES: In the United States, cirrhosis was the 12th leading cause of death in 2016. Despite end-stage cirrhosis being irreversible, earlier stages of hepatic fibrosis can be reversed via early diagnosis and intervention. The objective is to investigate the utility of a fully automated technique to measure liver surface nodularity (LSN) for staging hepatic fibrosis (stages F0-F4). MATERIALS AND METHODS: In this retrospective study, a dataset consisting of patients with multiple etiologies of liver disease collected at Institution-A (METAVIR F0-F4, 2000-2016) was used. The LSN was automatically measured in contrast-enhanced CT volumes and compared against scores from a manual tool. Area under the receiver operating characteristics curve (AUC) was used to distinguish between clinically significant fibrosis (≥ F2), advanced fibrosis (≥F3), and end-stage cirrhosis (F4). RESULTS: The study sample had 480 patients (304 men, 176 women, mean age, 49±9). Automatically derived LSN scores progressively increased with the fibrosis stage: F0 (1.64 [mean]±1.13 [standard deviation]), F1 (2.16±2.39), F2 (2.17±2.55), F3 (2.23±2.52), and F4 (4.21±2.94). For discriminating significant fibrosis (≥F2), advanced fibrosis (≥F3), and cirrhosis (F4), the automated tool achieved ROC AUCs of 73.9%, 82.5%, and 87.8% respectively. The sensitivity and specificity for significant fibrosis (nodularity threshold 1.51) was 85.2% and 73.3%, advanced fibrosis (nodularity threshold 1.73) was 84.2% and 79.5%, and cirrhosis (nodularity threshold 2.18) was 86.5% and 79.5%. Statistical tests revealed that the automated LSN scores distinguished patients with advanced fibrosis (p<.001) and cirrhosis (p<.001). CONCLUSION: The fully automated LSN measurement retained its predictive power for distinguishing between advanced fibrosis and cirrhosis. The clinical impact is that the fully automated LSN measurement may be useful for early interventions and population-based studies. It can automatically predict the fibrosis stage in â¼45 s in comparison to the â¼2 min needed to manually measure the LSN in a CT volume.
RESUMEN
Precise deformable image registration of multi-parametric MRI sequences is necessary for radiologists in order to identify abnormalities and diagnose diseases, such as prostate cancer and lymphoma. Despite recent advances in unsupervised learning-based registration, volumetric medical image registration that requires considering the variety of data distributions is still challenging. To address the problem of multi-parametric MRI sequence data registration, we propose an unsupervised domain-transported registration method, called OTMorph by employing neural optimal transport that learns an optimal transport plan to map different data distributions. We have designed a novel framework composed of a transport module and a registration module: the former transports data distribution from the moving source domain to the fixed target domain, and the latter takes the transported data and provides the deformed moving volume that is aligned with the fixed volume. Through end-to-end learning, our proposed method can effectively learn deformable registration for the volumes in different distributions. Experimental results with abdominal multi-parametric MRI sequence data show that our method has superior performance over around 67-85% in deforming the MRI volumes compared to the existing learning-based methods. Our method is generic in nature and can be used to register inter-/intra-modality images by mapping the different data distributions in network training.
RESUMEN
Volumetric assessment of edema due to anasarca can help monitor the progression of diseases such as kidney, liver or heart failure. The ability to measure edema non-invasively by automatic segmentation from abdominal CT scans may be of clinical importance. The current state-of-the-art method for edema segmentation using intensity priors is susceptible to false positives or under-segmentation errors. The application of modern supervised deep learning methods for 3D edema segmentation is limited due to challenges in manual annotation of edema. In the absence of accurate 3D annotations of edema, we propose a weakly supervised learning method that uses edema segmentations produced by intensity priors as pseudo-labels, along with pseudo-labels of muscle, subcutaneous and visceral adipose tissues for context, to produce more refined segmentations with demonstrably lower segmentation errors. The proposed method employs nnU-Nets in multiple stages to produce the final edema segmentation. The results demonstrate the potential of weakly supervised learning using edema and tissue pseudo-labels in improved quantification of edema for clinical applications.
RESUMEN
Background GPT-4V (GPT-4 with vision, ChatGPT; OpenAI) has shown impressive performance in several medical assessments. However, few studies have assessed its performance in interpreting radiologic images. Purpose To assess and compare the accuracy of GPT-4V in assessing radiologic cases with both images and textual context to that of radiologists and residents, to assess if GPT-4V assistance improves human accuracy, and to assess and compare the accuracy of GPT-4V with that of image-only or text-only inputs. Materials and Methods Seventy-two Case of the Day questions at the RSNA 2023 Annual Meeting were curated in this observer study. Answers from GPT-4V were obtained between November 26 and December 10, 2023, with the following inputs for each question: image only, text only, and both text and images. Five radiologists and three residents also answered the questions in an "open book" setting. For the artificial intelligence (AI)-assisted portion, the radiologists and residents were provided with the outputs of GPT-4V. The accuracy of radiologists and residents, both with and without AI assistance, was analyzed using a mixed-effects linear model. The accuracies of GPT-4V with different input combinations were compared by using the McNemar test. P < .05 was considered to indicate a significant difference. Results The accuracy of GPT-4V was 43% (31 of 72; 95% CI: 32, 55). Radiologists and residents did not significantly outperform GPT-4V in either imaging-dependent (59% and 56% vs 39%; P = .31 and .52, respectively) or imaging-independent (76% and 63% vs 70%; both P = .99) cases. With access to GPT-4V responses, there was no evidence of improvement in the average accuracy of the readers. The accuracy obtained by GPT-4V with text-only and image-only inputs was 50% (35 of 70; 95% CI: 39, 61) and 38% (26 of 69; 95% CI: 27, 49), respectively. Conclusion The radiologists and residents did not significantly outperform GPT-4V. Assistance from GPT-4V did not help human raters. GPT-4V relied on the textual context for its outputs. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Katz in this issue.
Asunto(s)
Radiología , Humanos , Competencia Clínica , Inteligencia Artificial , Sociedades Médicas , Internado y ResidenciaRESUMEN
Purpose: To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods: This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, National Institutes of Health (NIH) and University of Wisconsin (UofW). The model, trained on The Cancer Genome Atlas Ovarian Cancer dataset (mean age, 60 years ± 11 [s.d.]; 143 female), was tested on two internal (NIH-LC and NIH-OV) and one external dataset (UofW-LC). Its performance was measured by the Dice coefficient, standard deviations, and 95% confidence intervals, focusing on ascites volume in the peritoneal cavity. Results: On NIH-LC (25 patients; mean age, 59 years ± 14 [s.d.]; 14 male) and NIH-OV (166 patients; mean age, 65 years ± 9 [s.d.]; all female), the model achieved Dice scores of 0.855±0.061 (CI: 0.831-0.878) and 0.826±0.153 (CI: 0.764-0.887), with median volume estimation errors of 19.6% (IQR: 13.2-29.0) and 5.3% (IQR: 2.4-9.7) respectively. On UofW-LC (124 patients; mean age, 46 years ± 12 [s.d.]; 73 female), the model had a Dice score of 0.830±0.107 (CI: 0.798-0.863) and median volume estimation error of 9.7% (IQR: 4.5-15.1). The model showed strong agreement with expert assessments, with r 2 values of 0.79, 0.98, and 0.97 across the test sets. Conclusion: The proposed deep learning method performed well in segmenting and quantifying the volume of ascites in concordance with expert radiologist assessments.
RESUMEN
Diabetes mellitus and metabolic syndrome are closely linked with visceral body composition, but clinical assessment is limited to external measurements and laboratory values including hemoglobin A1c (HbA1c). Modern deep learning and AI algorithms allow automated extraction of biomarkers for organ size, density, and body composition from routine computed tomography (CT) exams. Comparing visceral CT biomarkers across groups with differing glycemic control revealed significant, progressive CT biomarker changes with increasing HbA1c. For example, in the unenhanced female cohort, mean changes between normal and poorly-controlled diabetes showed: 53% increase in visceral adipose tissue area, 22% increase in kidney volume, 24% increase in liver volume, 6% decrease in liver density (hepatic steatosis), 16% increase in skeletal muscle area, and 21% decrease in skeletal muscle density (myosteatosis) (all p < 0.001). The multisystem changes of metabolic syndrome can be objectively and retrospectively measured using automated CT biomarkers, with implications for diabetes, metabolic syndrome, and GLP-1 agonists.
Asunto(s)
Biomarcadores , Composición Corporal , Hemoglobina Glucada , Síndrome Metabólico , Tomografía Computarizada por Rayos X , Humanos , Síndrome Metabólico/metabolismo , Síndrome Metabólico/diagnóstico por imagen , Femenino , Hemoglobina Glucada/metabolismo , Hemoglobina Glucada/análisis , Tomografía Computarizada por Rayos X/métodos , Masculino , Biomarcadores/sangre , Persona de Mediana Edad , Anciano , Diabetes Mellitus/metabolismo , Diabetes Mellitus/diagnóstico por imagen , Adulto , Estudios Retrospectivos , Grasa Intraabdominal/diagnóstico por imagen , Grasa Intraabdominal/metabolismoRESUMEN
Background: The long-acting glucagon-like peptide-1 receptor agonist semaglutide is used to treat type 2 diabetes or obesity in adults. Clinical trials have observed associations of semaglutide with weight loss, improved diabetic control, and cardiovascular risk reduction. Objective: To evaluate intrapatient changes in body composition after initiation of semaglutide therapy by applying an automated suite of CT-based artificial intelligence (AI) body composition tools. Methods: This retrospective study included adult patients with semaglutide treatment who underwent abdominopelvic CT both within 5 years before and within 5 years after semaglutide initiation, between January 2016 and November 2023. An automated suite of previously validated CT-based AI body composition tools was applied to pre-semaglutide and post-semaglutide scans to quantify visceral adipose tissue (VAT) and subcutaneous adipose tissue (SAT) area, skeletal muscle area and attenuation, intermuscular adipose tissue (IMAT) area, liver volume and attenuation, and trabecular bone mineral density (BMD). Patients with ≥5-kg weight loss and ≥5-kg weight gain between scans were compared. Results: The study included 241 patients (mean age, 60.4±12.4 years; 151 women, 90 men). In the weight-loss group (n=67), the post-semaglutide scan, versus pre-semaglutide scan, showed decrease in VAT area (341.1 vs 309.4 cm2, p<.001), SAT area (371.4 vs 410.7 cm2, p<.001), muscle area (179.2 vs 193.0, p<.001), and liver volume (2379.0 vs 2578 HU, p=.009), and increase in liver attenuation (74.5 vs 67.6 HU, p=.03). In the weight-gain group (n=48), the post-semaglutide scan, versus pre-semaglutide scan, showed increase in VAT area (334.0 vs 312.8, p=.002), SAT area (485.8 vs 488.8 cm2, p=.01), and IMAT area (48.4 vs 37.6, p=.009), and decrease in muscle attenuation (5.9 vs 13.1, p<.001). Other comparisons were not significant (p>.05). Conclusion: Patients using semaglutide who lost versus gained weight demonstrated distinct patterns of changes in CT-based body composition measures. Those with weight loss exhibited overall favorable shifts in measures related to cardiometabolic risk. Muscle attenuation decrease in those with weight gain is consistent with decreased muscle quality. Clinical Impact: Automated CT-based AI tools provide biomarkers of body composition changes in patients using semaglutide beyond that which is evident by standard clinical measures.
RESUMEN
PURPOSE: Anasarca is a condition that results from organ dysfunctions, such as heart, kidney, or liver failure, characterized by the presence of edema throughout the body. The quantification of accumulated edema may have potential clinical benefits. This work focuses on accurately estimating the amount of edema non-invasively using abdominal CT scans, with minimal false positives. However, edema segmentation is challenging due to the complex appearance of edema and the lack of manually annotated volumes. METHODS: We propose a weakly supervised approach for edema segmentation using initial edema labels from the current state-of-the-art method for edema segmentation (Intensity Prior), along with labels of surrounding tissues as anatomical priors. A multi-class 3D nnU-Net was employed as the segmentation network, and training was performed using an iterative annotation workflow. RESULTS: We evaluated segmentation accuracy on a test set of 25 patients with edema. The average Dice Similarity Coefficient of the proposed method was similar to Intensity Prior (61.5% vs. 61.7%; p = 0.83 ). However, the proposed method reduced the average False Positive Rate significantly, from 1.8% to 1.1% ( p < 0.001 ). Edema volumes computed using automated segmentation had a strong correlation with manual annotation ( R 2 = 0.87 ). CONCLUSION: Weakly supervised learning using 3D multi-class labels and iterative annotation is an efficient way to perform high-quality edema segmentation with minimal false positives. Automated edema segmentation can produce edema volume estimates that are highly correlated with manual annotation. The proposed approach is promising for clinical applications to monitor anasarca using estimated edema volumes.
RESUMEN
Multi-parametric magnetic resonance imaging (mpMRI) exams have various series types acquired with different imaging protocols. The DICOM headers of these series often have incorrect information due to the sheer diversity of protocols and occasional technologist errors. To address this, we present a deep learning-based classification model to classify 8 different body mpMRI series types so that radiologists read the exams efficiently. Using mpMRI data from various institutions, multiple deep learning-based classifiers of ResNet, EfficientNet, and DenseNet are trained to classify 8 different MRI series, and their performance is compared. Then, the best-performing classifier is identified, and its classification capability under the setting of different training data quantities is studied. Also, the model is evaluated on the out-of-training-distribution datasets. Moreover, the model is trained using mpMRI exams obtained from different scanners in two training strategies, and its performance is tested. Experimental results show that the DenseNet-121 model achieves the highest F1-score and accuracy of 0.966 and 0.972 over the other classification models with p-value 0.05. The model shows greater than 0.95 accuracy when trained with over 729 studies of the training data, whose performance improves as the training data quantities grew larger. On the external data with the DLDS and CPTAC-UCEC datasets, the model yields 0.872 and 0.810 accuracy for each. These results indicate that in both the internal and external datasets, the DenseNet-121 model attains high accuracy for the task of classifying 8 body MRI series types.
RESUMEN
Chest radiography, commonly known as CXR, is frequently utilized in clinical settings to detect cardiopulmonary conditions. However, even seasoned radiologists might offer different evaluations regarding the seriousness and uncertainty associated with observed abnormalities. Previous research has attempted to utilize clinical notes to extract abnormal labels for training deep-learning models in CXR image diagnosis. However, these methods often neglected the varying degrees of severity and uncertainty linked to different labels. In our study, we initially assembled a comprehensive new dataset of CXR images based on clinical textual data, which incorporated radiologists' assessments of uncertainty and severity. Using this dataset, we introduced a multi-relationship graph learning framework that leverages spatial and semantic relationships while addressing expert uncertainty through a dedicated loss function. Our research showcases a notable enhancement in CXR image diagnosis and the interpretability of the diagnostic model, surpassing existing state-of-the-art methodologies. The dataset address of disease severity and uncertainty we extracted is: https://physionet.org/content/cad-chest/1.0/.
RESUMEN
Deformable image registration is one of the essential processes in analyzing medical images. In particular, when diagnosing abdominal diseases such as hepatic cancer and lymphoma, multi-domain images scanned from different modalities or different imaging protocols are often used. However, they are not aligned due to scanning times, patient breathing, movement, etc. Although recent learning-based approaches can provide deformations in real-time with high performance, multi-domain abdominal image registration using deep learning is still challenging since the images in different domains have different characteristics such as image contrast and intensity ranges. To address this, this paper proposes a novel unsupervised multi-domain image registration framework using neural optimal transport, dubbed OTMorph. When moving and fixed volumes are given as input, a transport module of our proposed model learns the optimal transport plan to map data distributions from the moving to the fixed volumes and estimates a domain-transported volume. Subsequently, a registration module taking the transported volume can effectively estimate the deformation field, leading to deformation performance improvement. Experimental results on multi-domain image registration using multi-modality and multi-parametric abdominal medical images demonstrate that the proposed method provides superior deformable registration via the domain-transported image that alleviates the domain gap between the input images. Also, we attain the improvement even on out-of-distribution data, which indicates the superior generalizability of our model for the registration of various medical images. Our source code is available at https://github.com/boahK/OTMorph.
RESUMEN
OBJECTIVES: To evaluate the utility of CT-based abdominal fat measures for predicting the risk of death and cardiometabolic disease in an asymptomatic adult screening population. METHODS: Fully automated AI tools quantifying abdominal adipose tissue (L3 level visceral [VAT] and subcutaneous [SAT] fat area, visceral-to-subcutaneous fat ratio [VSR], VAT attenuation), muscle attenuation (L3 level), and liver attenuation were applied to non-contrast CT scans in asymptomatic adults undergoing CT colonography (CTC). Longitudinal follow-up documented subsequent deaths, cardiovascular events, and diabetes. ROC and time-to-event analyses were performed to generate AUCs and hazard ratios (HR) binned by octile. RESULTS: A total of 9223 adults (mean age, 57 years; 4071:5152 M:F) underwent screening CTC from April 2004 to December 2016. 549 patients died on follow-up (median, nine years). Fat measures outperformed BMI for predicting mortality risk-5-year AUCs for muscle attenuation, VSR, and BMI were 0.721, 0.661, and 0.499, respectively. Higher visceral, muscle, and liver fat were associated with increased mortality risk-VSR > 1.53, HR = 3.1; muscle attenuation < 15 HU, HR = 5.4; liver attenuation < 45 HU, HR = 2.3. Higher VAT area and VSR were associated with increased cardiovascular event and diabetes risk-VSR > 1.59, HR = 2.6 for cardiovascular event; VAT area > 291 cm2, HR = 6.3 for diabetes (p < 0.001). A U-shaped association was observed for SAT with a higher risk of death for very low and very high SAT. CONCLUSION: Fully automated CT-based measures of abdominal fat are predictive of mortality and cardiometabolic disease risk in asymptomatic adults and uncover trends that are not reflected in anthropomorphic measures. CLINICAL RELEVANCE STATEMENT: Fully automated CT-based measures of abdominal fat soundly outperform anthropometric measures for mortality and cardiometabolic risk prediction in asymptomatic patients. KEY POINTS: Abdominal fat depots associated with metabolic dysregulation and cardiovascular disease can be derived from abdominal CT. Fully automated AI body composition tools can measure factors associated with increased mortality and cardiometabolic risk. CT-based abdominal fat measures uncover trends in mortality and cardiometabolic risk not captured by BMI in asymptomatic outpatients.
RESUMEN
The skeletal region is one of the common sites of metastatic spread of cancer in the breast and prostate. CT is routinely used to measure the size of lesions in the bones. However, they can be difficult to spot due to the wide variations in their sizes, shapes, and appearances. Precise localization of such lesions would enable reliable tracking of interval changes (growth, shrinkage, or unchanged status). To that end, an automated technique to detect bone lesions is highly desirable. In this pilot work, we developed a pipeline to detect bone lesions (lytic, blastic, and mixed) in CT volumes via a proxy segmentation task. First, we used the bone lesions that were prospectively marked by radiologists in a few 2D slices of CT volumes and converted them into weak 3D segmentation masks. Then, we trained a 3D full-resolution nnUNet model using these weak 3D annotations to segment the lesions and thereby detected them. Our automated method detected bone lesions in CT with a precision of 96.7% and recall of 47.3% despite the use of incomplete and partial training data. To the best of our knowledge, we are the first to attempt the direct detection of bone lesions in CT via a proxy segmentation task.
RESUMEN
Medical Visual Question Answering (VQA) is an important task in medical multi-modal Large Language Models (LLMs), aiming to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. However, existing medical VQA datasets are small and only contain simple questions (equivalent to classification tasks), which lack semantic reasoning and clinical knowledge. Our previous work proposed a clinical knowledge-driven image difference VQA benchmark using a rule-based approach (Hu et al., 2023). However, given the same breadth of information coverage, the rule-based approach shows an 85% error rate on extracted labels. We trained an LLM method to extract labels with 62% increased accuracy. We also comprehensively evaluated our labels with 2 clinical experts on 100 samples to help us fine-tune the LLM. Based on the trained LLM model, we proposed a large-scale medical VQA dataset, Medical-CXR-VQA, using LLMs focused on chest X-ray images. The questions involved detailed information, such as abnormalities, locations, levels, and types. Based on this dataset, we proposed a novel VQA method by constructing three different relationship graphs: spatial relationships, semantic relationships, and implicit relationship graphs on the image regions, questions, and semantic labels. We leveraged graph attention to learn the logical reasoning paths for different questions. These learned graph VQA reasoning paths can be further used for LLM prompt engineering and chain-of-thought, which are crucial for further fine-tuning and training multi-modal large language models. Moreover, we demonstrate that our approach has the qualities of evidence and faithfulness, which are crucial in the clinical field. The code and the dataset is available at https://github.com/Holipori/Medical-CXR-VQA.
Asunto(s)
Aprendizaje Automático , Humanos , Interpretación de Imagen Asistida por Computador/métodos , SemánticaRESUMEN
Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges-an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
RESUMEN
The Radiological Society of North of America (RSNA) and the Medical Image Computing and Computer Assisted Intervention (MICCAI) Society have led a series of joint panels and seminars focused on the present impact and future directions of artificial intelligence (AI) in radiology. These conversations have collected viewpoints from multidisciplinary experts in radiology, medical imaging, and machine learning on the current clinical penetration of AI technology in radiology and how it is impacted by trust, reproducibility, explainability, and accountability. The collective points-both practical and philosophical-define the cultural changes for radiologists and AI scientists working together and describe the challenges ahead for AI technologies to meet broad approval. This article presents the perspectives of experts from MICCAI and RSNA on the clinical, cultural, computational, and regulatory considerations-coupled with recommended reading materials-essential to adopt AI technology successfully in radiology and, more generally, in clinical practice. The report emphasizes the importance of collaboration to improve clinical deployment, highlights the need to integrate clinical and medical imaging data, and introduces strategies to ensure smooth and incentivized integration. Keywords: Adults and Pediatrics, Computer Applications-General (Informatics), Diagnosis, Prognosis © RSNA, 2024.
Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Radiología/métodos , Sociedades MédicasRESUMEN
Pheochromocytomas and Paragangliomas (PPGLs) are rare adrenal and extra-adrenal tumors that have metastatic potential. Management of patients with PPGLs mainly depends on the makeup of their genetic cluster: SDHx, VHL/EPAS1, kinase, and sporadic. CT is the preferred modality for precise localization of PPGLs, such that their metastatic progression can be assessed. However, the variable size, morphology, and appearance of these tumors in different anatomical regions can pose challenges for radiologists. Since radiologists must routinely track changes across patient visits, manual annotation of PPGLs is quite time-consuming and cumbersome to do across all axial slices in a CT volume. As such, PPGLs are only weakly annotated on axial slices by radiologists in the form of RECIST measurements. To ameliorate the manual effort spent by radiologists, we propose a method for the automated detection of PPGLs in CT via a proxy segmentation task. Weak 3D annotations (derived from 2D bounding boxes) were used to train both 2D and 3D nnUNet models to detect PPGLs via segmentation. We evaluated our approaches on an in-house dataset comprised of chest-abdomen-pelvis CTs of 255 patients with confirmed PPGLs. On a test set of 53 CT volumes, our 3D nnUNet model achieved a detection precision of 70% and sensitivity of 64.1%, and outperformed the 2D model that obtained a precision of 52.7% and sensitivity of 27.5% (p< 0.05). SDHx and sporadic genetic clusters achieved the highest precisions of 73.1% and 72.7% respectively. Our state-of-the art findings highlight the promising nature of the challenging task of automated PPGL detection.
Asunto(s)
Neoplasias de las Glándulas Suprarrenales , Paraganglioma , Feocromocitoma , Tomografía Computarizada por Rayos X , Humanos , Feocromocitoma/diagnóstico por imagen , Paraganglioma/diagnóstico por imagen , Neoplasias de las Glándulas Suprarrenales/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodosRESUMEN
In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI-generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.
RESUMEN
Multi-parametric MRI (mpMRI) studies are widely available in clinical practice for the diagnosis of various diseases. As the volume of mpMRI exams increases yearly, there are concomitant inaccuracies that exist within the DICOM header fields of these exams. This precludes the use of the header information for the arrangement of the different series as part of the radiologist's hanging protocol, and clinician oversight is needed for correction. In this pilot work, we propose an automated framework to classify the type of 8 different series in mpMRI studies. We used 1,363 studies acquired by three Siemens scanners to train a DenseNet-121 model with 5-fold cross-validation. Then, we evaluated the performance of the DenseNet-121 ensemble on a held-out test set of 313 mpMRI studies. Our method achieved an average precision of 96.6%, sensitivity of 96.6%, specificity of 99.6%, and F 1 score of 96.6% for the MRI series classification task. To the best of our knowledge, we are the first to develop a method to classify the series type in mpMRI studies acquired at the level of the chest, abdomen, and pelvis. Our method has the capability for robust automation of hanging protocols in modern radiology practice.
RESUMEN
Purpose To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and patients with ovarian cancer. Materials and Methods This retrospective study included contrast-enhanced and noncontrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, National Institutes of Health (NIH) and University of Wisconsin (UofW). The model, trained on The Cancer Genome Atlas Ovarian Cancer dataset (mean age [±SD], 60 years ± 11; 143 female), was tested on two internal datasets (NIH-LC and NIH-OV) and one external dataset (UofW-LC). Its performance was measured by the F1/Dice coefficient, SDs, and 95% CIs, focusing on ascites volume in the peritoneal cavity. Results On NIH-LC (25 patients; mean age, 59 years ± 14; 14 male) and NIH-OV (166 patients; mean age, 65 years ± 9; all female), the model achieved F1/Dice scores of 85.5% ± 6.1 (95% CI: 83.1, 87.8) and 82.6% ± 15.3 (95% CI: 76.4, 88.7), with median volume estimation errors of 19.6% (IQR, 13.2%-29.0%) and 5.3% (IQR: 2.4%-9.7%), respectively. On UofW-LC (124 patients; mean age, 46 years ± 12; 73 female), the model had a F1/Dice score of 83.0% ± 10.7 (95% CI: 79.8, 86.3) and median volume estimation error of 9.7% (IQR, 4.5%-15.1%). The model showed strong agreement with expert assessments, with r2 values of 0.79, 0.98, and 0.97 across the test sets. Conclusion The proposed deep learning method performed well in segmenting and quantifying the volume of ascites in patients with cirrhosis and those with ovarian cancer, in concordance with expert radiologist assessments. Keywords: Abdomen/GI, Cirrhosis, Deep Learning, Segmentation Supplemental material is available for this article. © RSNA, 2024 See also commentary by Aisen and Rodrigues in this issue.