Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 120
1.
Eur Radiol Exp ; 8(1): 66, 2024 Jun 05.
Article En | MEDLINE | ID: mdl-38834751

BACKGROUND: Quantitative techniques such as T2 and T1ρ mapping allow evaluating the cartilage and meniscus. We evaluated multi-interleaved X-prepared turbo-spin echo with intuitive relaxometry (MIXTURE) sequences with turbo spin-echo (TSE) contrast and additional parameter maps versus reference TSE sequences in an in situ model of human cartilage defects. METHODS: Standardized cartilage defects of 8, 5, and 3 mm in diameter were created in the lateral femora of ten human cadaveric knee specimens (81 ± 10 years old; nine males, one female). MIXTURE sequences providing proton density-weighted fat-saturated images and T2 maps or T1-weighted images and T1ρ maps as well as the corresponding two- and three-dimensional TSE reference sequences were acquired before and after defect creation (3-T scanner; knee coil). Defect delineability, bone texture, and cartilage relaxation times were quantified. Appropriate parametric or non-parametric tests were used. RESULTS: Overall, defect delineability and texture features were not significantly different between the MIXTURE and reference sequences (p ≤ 0.47). After defect creation, relaxation times significantly increased in the central femur (T2pre = 51 ± 4 ms [mean ± standard deviation] versus T2post = 56 ± 4 ms; p = 0.002) and all regions combined (T1ρpre = 40 ± 4 ms versus T1ρpost = 43 ± 4 ms; p = 0.004). CONCLUSIONS: MIXTURE permitted time-efficient simultaneous morphologic and quantitative joint assessment based on clinical image contrasts. While providing T2 or T1ρ maps in clinically feasible scan time, morphologic image features, i.e., cartilage defects and bone texture, were comparable between MIXTURE and reference sequences. RELEVANCE STATEMENT: Equally time-efficient and versatile, the MIXTURE sequence platform combines morphologic imaging using familiar contrasts, excellent image correspondence versus corresponding reference sequences and quantitative mapping information, thereby increasing the diagnostic value beyond mere morphology. KEY POINTS: • Combined morphologic and quantitative MIXTURE sequences are based on three-dimensional TSE contrasts. • MIXTURE sequences were studied in an in situ human cartilage defect model. • Morphologic image features, i.e., defect delineabilty and bone texture, were investigated. • Morphologic image features were similar between MIXTURE and reference sequences. • MIXTURE allowed time-efficient simultaneous morphologic and quantitative knee joint assessment.


Cadaver , Cartilage, Articular , Knee Joint , Magnetic Resonance Imaging , Humans , Male , Magnetic Resonance Imaging/methods , Female , Cartilage, Articular/diagnostic imaging , Knee Joint/diagnostic imaging , Aged, 80 and over , Aged
2.
Diagnostics (Basel) ; 14(10)2024 May 08.
Article En | MEDLINE | ID: mdl-38786276

Quantitative MRI techniques such as T2 and T1ρ mapping are beneficial in evaluating knee joint pathologies; however, long acquisition times limit their clinical adoption. MIXTURE (Multi-Interleaved X-prepared Turbo Spin-Echo with IntUitive RElaxometry) provides a versatile turbo spin-echo (TSE) platform for simultaneous morphologic and quantitative joint imaging. Two MIXTURE sequences were designed along clinical requirements: "MIX1", combining proton density (PD)-weighted fat-saturated (FS) images and T2 mapping (acquisition time: 4:59 min), and "MIX2", combining T1-weighted images and T1ρ mapping (6:38 min). MIXTURE sequences and their reference 2D and 3D TSE counterparts were acquired from ten human cadaveric knee joints at 3.0 T. Contrast, contrast-to-noise ratios, and coefficients of variation were comparatively evaluated using parametric tests. Clinical radiologists (n = 3) assessed diagnostic quality as a function of sequence and anatomic structure using five-point Likert scales and ordinal regression, with a significance level of α = 0.01. MIX1 and MIX2 had at least equal diagnostic quality compared to reference sequences of the same image weighting. Contrast, contrast-to-noise ratios, and coefficients of variation were largely similar for the PD-weighted FS and T1-weighted images. In clinically feasible scan times, MIXTURE sequences yield morphologic, TSE-based images of diagnostic quality and quantitative parameter maps with additional insights on soft tissue composition and ultrastructure.

3.
Eur Radiol ; 2024 Apr 16.
Article En | MEDLINE | ID: mdl-38627289

OBJECTIVES: Large language models (LLMs) have shown potential in radiology, but their ability to aid radiologists in interpreting imaging studies remains unexplored. We investigated the effects of a state-of-the-art LLM (GPT-4) on the radiologists' diagnostic workflow. MATERIALS AND METHODS: In this retrospective study, six radiologists of different experience levels read 40 selected radiographic [n = 10], CT [n = 10], MRI [n = 10], and angiographic [n = 10] studies unassisted (session one) and assisted by GPT-4 (session two). Each imaging study was presented with demographic data, the chief complaint, and associated symptoms, and diagnoses were registered using an online survey tool. The impact of Artificial Intelligence (AI) on diagnostic accuracy, confidence, user experience, input prompts, and generated responses was assessed. False information was registered. Linear mixed-effect models were used to quantify the factors (fixed: experience, modality, AI assistance; random: radiologist) influencing diagnostic accuracy and confidence. RESULTS: When assessing if the correct diagnosis was among the top-3 differential diagnoses, diagnostic accuracy improved slightly from 181/240 (75.4%, unassisted) to 188/240 (78.3%, AI-assisted). Similar improvements were found when only the top differential diagnosis was considered. AI assistance was used in 77.5% of the readings. Three hundred nine prompts were generated, primarily involving differential diagnoses (59.1%) and imaging features of specific conditions (27.5%). Diagnostic confidence was significantly higher when readings were AI-assisted (p > 0.001). Twenty-three responses (7.4%) were classified as hallucinations, while two (0.6%) were misinterpretations. CONCLUSION: Integrating GPT-4 in the diagnostic process improved diagnostic accuracy slightly and diagnostic confidence significantly. Potentially harmful hallucinations and misinterpretations call for caution and highlight the need for further safeguarding measures. CLINICAL RELEVANCE STATEMENT: Using GPT-4 as a virtual assistant when reading images made six radiologists of different experience levels feel more confident and provide more accurate diagnoses; yet, GPT-4 gave factually incorrect and potentially harmful information in 7.4% of its responses.

4.
Eur Radiol Exp ; 8(1): 53, 2024 May 01.
Article En | MEDLINE | ID: mdl-38689178

BACKGROUND: To compare denoising diffusion probabilistic models (DDPM) and generative adversarial networks (GAN) for recovering contrast-enhanced breast magnetic resonance imaging (MRI) subtraction images from virtual low-dose subtraction images. METHODS: Retrospective, ethically approved study. DDPM- and GAN-reconstructed single-slice subtraction images of 50 breasts with enhancing lesions were compared to original ones at three dose levels (25%, 10%, 5%) using quantitative measures and radiologic evaluations. Two radiologists stated their preference based on the reconstruction quality and scored the lesion conspicuity as compared to the original, blinded to the model. Fifty lesion-free maximum intensity projections were evaluated for the presence of false-positives. Results were compared between models and dose levels, using generalized linear mixed models. RESULTS: At 5% dose, both radiologists preferred the GAN-generated images, whereas at 25% dose, both radiologists preferred the DDPM-generated images. Median lesion conspicuity scores did not differ between GAN and DDPM at 25% dose (5 versus 5, p = 1.000) and 10% dose (4 versus 4, p = 1.000). At 5% dose, both readers assigned higher conspicuity to the GAN than to the DDPM (3 versus 2, p = 0.007). In the lesion-free examinations, DDPM and GAN showed no differences in the false-positive rate at 5% (15% versus 22%), 10% (10% versus 6%), and 25% (6% versus 4%) (p = 1.000). CONCLUSIONS: Both GAN and DDPM yielded promising results in low-dose image reconstruction. However, neither of them showed superior results over the other model for all dose levels and evaluation metrics. Further development is needed to counteract false-positives. RELEVANCE STATEMENT: For MRI-based breast cancer screening, reducing the contrast agent dose is desirable. Diffusion probabilistic models and generative adversarial networks were capable of retrospectively enhancing the signal of low-dose images. Hence, they may supplement imaging with reduced doses in the future. KEY POINTS: • Deep learning may help recover signal in low-dose contrast-enhanced breast MRI. • Two models (DDPM and GAN) were trained at different dose levels. • Radiologists preferred DDPM at 25%, and GAN images at 5% dose. • Lesion conspicuity between DDPM and GAN was similar, except at 5% dose. • GAN and DDPM yield promising results in low-dose image reconstruction.


Breast Neoplasms , Contrast Media , Magnetic Resonance Imaging , Humans , Female , Retrospective Studies , Contrast Media/administration & dosage , Breast Neoplasms/diagnostic imaging , Magnetic Resonance Imaging/methods , Middle Aged , Models, Statistical , Adult , Aged
5.
JAMA ; 331(15): 1320-1321, 2024 04 16.
Article En | MEDLINE | ID: mdl-38497956

This study compares 2 large language models and their performance vs that of competing open-source models.


Artificial Intelligence , Diagnostic Imaging , Medical History Taking , Language
6.
Commun Med (Lond) ; 4(1): 46, 2024 Mar 14.
Article En | MEDLINE | ID: mdl-38486100

BACKGROUND: Artificial intelligence (AI) models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. METHODS: We used two datasets: (1) A large dataset (N = 193,311) of high quality clinical chest radiographs, and (2) a dataset (N = 1625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver operating characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. RESULTS: We find that, while the privacy-preserving training yields lower accuracy, it largely does not amplify discrimination against age, sex or co-morbidity. However, we find an indication that difficult diagnoses and subgroups suffer stronger performance hits in private training. CONCLUSIONS: Our study shows that - under the challenging realistic circumstances of a real-life clinical dataset - the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.


Artificial intelligence (AI), in which computers can learn to do tasks that normally require human intelligence, is particularly useful in medical imaging. However, AI should be used in a way that preserves patient privacy. We explored the balance between maintaining patient data privacy and AI performance in medical imaging. We use an approach called differential privacy to protect the privacy of patients' images. We show that, although training AI with differential privacy leads to a slight decrease in accuracy, it does not substantially increase bias against different age groups, genders, or patients with multiple health conditions. However, we notice that AI faces more challenges in accurately diagnosing complex cases and specific subgroups when trained under these privacy constraints. These findings highlight the importance of designing AI systems that are both privacy-conscious and capable of reliable diagnoses across patient groups.

7.
Diagnostics (Basel) ; 14(5)2024 Feb 23.
Article En | MEDLINE | ID: mdl-38472955

Increased attention has been given to MRI in radiation-free screening for malignant nodules in recent years. Our objective was to compare the performance of human readers and radiomic feature analysis based on stand-alone and complementary CT and MRI imaging in classifying pulmonary nodules. This single-center study comprises patients with CT findings of pulmonary nodules who underwent additional lung MRI and whose nodules were classified as benign/malignant by resection. For radiomic features analysis, 2D segmentation was performed for each lung nodule on axial CT, T2-weighted (T2w), and diffusion (DWI) images. The 105 extracted features were reduced by iterative backward selection. The performance of radiomics and human readers was compared by calculating accuracy with Clopper-Pearson confidence intervals. Fifty patients (mean age 63 +/- 10 years) with 66 pulmonary nodules (40 malignant) were evaluated. ACC values for radiomic features analysis vs. radiologists based on CT alone (0.68; 95%CI: 0.56, 0.79 vs. 0.59; 95%CI: 0.46, 0.71), T2w alone (0.65; 95%CI: 0.52, 0.77 vs. 0.68; 95%CI: 0.54, 0.78), DWI alone (0.61; 95%CI:0.48, 0.72 vs. 0.73; 95%CI: 0.60, 0.83), combined T2w/DWI (0.73; 95%CI: 0.60, 0.83 vs. 0.70; 95%CI: 0.57, 0.80), and combined CT/T2w/DWI (0.83; 95%CI: 0.72, 0.91 vs. 0.64; 95%CI: 0.51, 0.75) were calculated. This study is the first to show that by combining quantitative image information from CT, T2w, and DWI datasets, pulmonary nodule assessment through radiomics analysis is superior to using one modality alone, even exceeding human readers' performance.

9.
Nat Commun ; 15(1): 1603, 2024 Feb 21.
Article En | MEDLINE | ID: mdl-38383555

A knowledge gap persists between machine learning (ML) developers (e.g., data scientists) and practitioners (e.g., clinicians), hampering the full utilization of ML for clinical data analysis. We investigated the potential of the ChatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this gap and perform ML analyses efficiently. Real-world clinical datasets and study details from large trials across various medical specialties were presented to ChatGPT ADA without specific guidance. ChatGPT ADA autonomously developed state-of-the-art ML models based on the original study's training data to predict clinical outcomes such as cancer development, cancer progression, disease complications, or biomarkers such as pathogenic gene sequences. Following the re-implementation and optimization of the published models, the head-to-head comparison of the ChatGPT ADA-crafted ML models and their respective manually crafted counterparts revealed no significant differences in traditional performance metrics (p ≥ 0.072). Strikingly, the ChatGPT ADA-crafted ML models often outperformed their counterparts. In conclusion, ChatGPT ADA offers a promising avenue to democratize ML in medicine by simplifying complex data analyses, yet should enhance, not replace, specialized training and resources, to promote broader applications in medical research and practice.


Algorithms , Neoplasms , Humans , Benchmarking , Language , Machine Learning
10.
Eur Radiol Exp ; 8(1): 10, 2024 Feb 08.
Article En | MEDLINE | ID: mdl-38326501

BACKGROUND: Pretraining labeled datasets, like ImageNet, have become a technical standard in advanced medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pretraining on non-medical images can be applied to chest radiographs and how it compares to supervised pretraining on non-medical images and on medical images. METHODS: We utilized a vision transformer and initialized its weights based on the following: (i) SSL pretraining on non-medical images (DINOv2), (ii) supervised learning (SL) pretraining on non-medical images (ImageNet dataset), and (iii) SL pretraining on chest radiographs from the MIMIC-CXR database, the largest labeled public dataset of chest radiographs to date. We tested our approach on over 800,000 chest radiographs from 6 large global datasets, diagnosing more than 20 different imaging findings. Performance was quantified using the area under the receiver operating characteristic curve and evaluated for statistical significance using bootstrapping. RESULTS: SSL pretraining on non-medical images not only outperformed ImageNet-based pretraining (p < 0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pretraining strategy, especially with SSL, can be pivotal for improving diagnostic accuracy of artificial intelligence in medical imaging. CONCLUSIONS: By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging. RELEVANCE STATEMENT: Self-supervised learning highlights a paradigm shift towards the enhancement of AI-driven accuracy and efficiency in medical imaging. Given its promise, the broader application of self-supervised learning in medical imaging calls for deeper exploration, particularly in contexts where comprehensive annotated datasets are limited.


Artificial Intelligence , Deep Learning , Databases, Factual
11.
Radiologie (Heidelb) ; 64(4): 304-311, 2024 Apr.
Article De | MEDLINE | ID: mdl-38170243

High-quality magnetic resonance (MR) imaging is essential for the precise assessment of the knee joint and plays a key role in the diagnostics, treatment and prognosis. Intact cartilage tissue is characterized by a smooth surface, uniform tissue thickness and an organized zonal structure, which are manifested as depth-dependent signal intensity variations. Cartilage pathologies are identifiable through alterations in signal intensity and morphology and should be communicated based on a precise terminology. Cartilage pathologies can show hyperintense and hypointense signal alterations. Cartilage defects are assessed based on their depth and should be described in terms of their location and extent. The following symptom constellations are of overarching clinical relevance in image reading and interpretation: symptom constellations associated with rapidly progressive forms of joint degeneration and unfavorable prognosis, accompanying symptom constellations mostly in connection with destabilizing meniscal lesions and subchondral insufficiency fractures (accelerated osteoarthritis) as well as symptoms beyond the "typical" degeneration, especially when a discrepancy is observed between (minor) structural changes and (major) synovitis and effusion (inflammatory arthropathy).


Cartilage, Articular , Osteoarthritis, Knee , Humans , Osteoarthritis, Knee/complications , Osteoarthritis, Knee/pathology , Cartilage, Articular/pathology , Disease Progression , Knee Joint/pathology , Magnetic Resonance Imaging/methods
12.
Skeletal Radiol ; 53(4): 791-800, 2024 Apr.
Article En | MEDLINE | ID: mdl-37819279

OBJECTIVE: Clinical-standard MRI is the imaging modality of choice for the wrist, yet limited to static evaluation, thereby potentially missing dynamic instability patterns. We aimed to investigate the clinical benefit of (dynamic) real-time MRI, complemented by automatic analysis, in patients with complete or partial scapholunate ligament (SLL) tears. MATERIAL AND METHODS: Both wrists of ten patients with unilateral SLL tears (six partial, four complete tears) as diagnosed by clinical-standard MRI were imaged during continuous active radioulnar motion using a 1.5-T MRI scanner in combination with a custom-made motion device. Following automatic segmentation of the wrist, the scapholunate and lunotriquetral joint widths were analyzed across the entire range of motion (ROM). Mixed-effects model analysis of variance (ANOVA) followed by Tukey's posthoc test and two-way ANOVA were used for statistical analysis. RESULTS: With the increasing extent of SLL tear, the scapholunate joint widths in injured wrists were significantly larger over the entire ROM compared to those of the contralateral healthy wrists (p<0.001). Differences between partial and complete tears were most pronounced at 5°-15° ulnar abduction (p<0.001). Motion patterns and trajectories were altered. Complete SLL deficiency resulted in complex alterations of the lunotriquetral joint widths. CONCLUSION: Real-time MRI may improve the functional diagnosis of SLL insufficiency and aid therapeutic decision-making by revealing dynamic forms of dissociative instability within the proximal carpus. Static MRI best differentiates SLL-injured wrists at 5°-15° of ulnar abduction.


Carpal Joints , Joint Instability , Wrist Injuries , Humans , Wrist Joint/diagnostic imaging , Magnetic Resonance Imaging/methods , Carpal Joints/diagnostic imaging , Ligaments, Articular/diagnostic imaging , Magnetic Resonance Spectroscopy , Joint Instability/diagnostic imaging , Wrist Injuries/diagnostic imaging
13.
Radiologie (Heidelb) ; 64(4): 295-303, 2024 Apr.
Article De | MEDLINE | ID: mdl-38158404

Magnetic resonance imaging (MRI) is the clinical method of choice for cartilage imaging in the context of degenerative and nondegenerative joint diseases. The MRI-based definitions of osteoarthritis rely on the detection of osteophytes, cartilage pathologies, bone marrow edema and meniscal lesions but currently a scientific consensus is lacking. In the clinical routine proton density-weighted, fat-suppressed 2D turbo spin echo sequences with echo times of 30-40 ms are predominantly used, which are sufficiently sensitive and specific for the assessment of cartilage. The additionally acquired T1-weighted sequences are primarily used for evaluating other intra-articular and periarticular structures. Diagnostically relevant artifacts include magic angle and chemical shift artifacts, which can lead to artificial signal enhancement in cartilage or incorrect representations of the subchondral lamina and its thickness. Although scientifically validated, high-resolution 3D gradient echo sequences (for cartilage segmentation) and compositional MR sequences (for quantification of physical tissue parameters) are currently reserved for scientific research questions. The future integration of artificial intelligence techniques in areas such as image reconstruction (to reduce scan times while maintaining image quality), image analysis (for automated identification of cartilage defects), and image postprocessing (for automated segmentation of cartilage in terms of volume and thickness) will significantly improve the diagnostic workflow and advance the field further.


Cartilage Diseases , Cartilage, Articular , Osteoarthritis, Knee , Humans , Osteoarthritis, Knee/pathology , Cartilage, Articular/pathology , Artificial Intelligence , Cartilage Diseases/pathology , Magnetic Resonance Imaging/methods
14.
J Pathol ; 262(3): 310-319, 2024 03.
Article En | MEDLINE | ID: mdl-38098169

Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Glioblastoma , Precision Medicine , Humans , Machine Learning , United Kingdom
15.
Med Image Anal ; 92: 103059, 2024 Feb.
Article En | MEDLINE | ID: mdl-38104402

Artificial intelligence (AI) has a multitude of applications in cancer research and oncology. However, the training of AI systems is impeded by the limited availability of large datasets due to data protection requirements and other regulatory obstacles. Federated and swarm learning represent possible solutions to this problem by collaboratively training AI models while avoiding data transfer. However, in these decentralized methods, weight updates are still transferred to the aggregation server for merging the models. This leaves the possibility for a breach of data privacy, for example by model inversion or membership inference attacks by untrusted servers. Somewhat-homomorphically-encrypted federated learning (SHEFL) is a solution to this problem because only encrypted weights are transferred, and model updates are performed in the encrypted space. Here, we demonstrate the first successful implementation of SHEFL in a range of clinically relevant tasks in cancer image analysis on multicentric datasets in radiology and histopathology. We show that SHEFL enables the training of AI models which outperform locally trained models and perform on par with models which are centrally trained. In the future, SHEFL can enable multiple institutions to co-train AI models without forsaking data governance and without ever transmitting any decryptable data to untrusted servers.


Neoplasms , Radiology , Humans , Artificial Intelligence , Learning , Neoplasms/diagnostic imaging , Image Processing, Computer-Assisted
16.
Sci Rep ; 13(1): 22576, 2023 12 19.
Article En | MEDLINE | ID: mdl-38114729

Developing robust artificial intelligence (AI) models that generalize well to unseen datasets is challenging and usually requires large and variable datasets, preferably from multiple institutions. In federated learning (FL), a model is trained collaboratively at numerous sites that hold local datasets without exchanging them. So far, the impact of training strategy, i.e., local versus collaborative, on the diagnostic on-domain and off-domain performance of AI models interpreting chest radiographs has not been assessed. Consequently, using 610,000 chest radiographs from five institutions across the globe, we assessed diagnostic performance as a function of training strategy (i.e., local vs. collaborative), network architecture (i.e., convolutional vs. transformer-based), single versus cross-institutional performance (i.e., on-domain vs. off-domain), imaging finding (i.e., cardiomegaly, pleural effusion, pneumonia, atelectasis, consolidation, pneumothorax, and no abnormality), dataset size (i.e., from n = 18,000 to 213,921 radiographs), and dataset diversity. Large datasets not only showed minimal performance gains with FL but, in some instances, even exhibited decreases. In contrast, smaller datasets revealed marked improvements. Thus, on-domain performance was mainly driven by training data size. However, off-domain performance leaned more on training diversity. When trained collaboratively across diverse external institutions, AI models consistently surpassed models trained locally for off-domain tasks, emphasizing FL's potential in leveraging data diversity. In conclusion, FL can bolster diagnostic privacy, reproducibility, and off-domain reliability of AI models and, potentially, optimize healthcare outcomes.


Artificial Intelligence , Learning , Reproducibility of Results , Generalization, Psychological , Radiography
17.
Sci Rep ; 13(1): 20159, 2023 11 17.
Article En | MEDLINE | ID: mdl-37978240

Large language models (LLMs) have shown potential in various applications, including clinical practice. However, their accuracy and utility in providing treatment recommendations for orthopedic conditions remain to be investigated. Thus, this pilot study aims to evaluate the validity of treatment recommendations generated by GPT-4 for common knee and shoulder orthopedic conditions using anonymized clinical MRI reports. A retrospective analysis was conducted using 20 anonymized clinical MRI reports, with varying severity and complexity. Treatment recommendations were elicited from GPT-4 and evaluated by two board-certified specialty-trained senior orthopedic surgeons. Their evaluation focused on semiquantitative gradings of accuracy and clinical utility and potential limitations of the LLM-generated recommendations. GPT-4 provided treatment recommendations for 20 patients (mean age, 50 years ± 19 [standard deviation]; 12 men) with acute and chronic knee and shoulder conditions. The LLM produced largely accurate and clinically useful recommendations. However, limited awareness of a patient's overall situation, a tendency to incorrectly appreciate treatment urgency, and largely schematic and unspecific treatment recommendations were observed and may reduce its clinical usefulness. In conclusion, LLM-based treatment recommendations are largely adequate and not prone to 'hallucinations', yet inadequate in particular situations. Critical guidance by healthcare professionals is obligatory, and independent use by patients is discouraged, given the dependency on precise data input.


Medicine , Musculoskeletal Diseases , Male , Humans , Middle Aged , Pilot Projects , Retrospective Studies , Language , Magnetic Resonance Imaging
18.
Radiology ; 309(1): e230806, 2023 10.
Article En | MEDLINE | ID: mdl-37787671

Background Clinicians consider both imaging and nonimaging data when diagnosing diseases; however, current machine learning approaches primarily consider data from a single modality. Purpose To develop a neural network architecture capable of integrating multimodal patient data and compare its performance to models incorporating a single modality for diagnosing up to 25 pathologic conditions. Materials and Methods In this retrospective study, imaging and nonimaging patient data were extracted from the Medical Information Mart for Intensive Care (MIMIC) database and an internal database comprised of chest radiographs and clinical parameters inpatients in the intensive care unit (ICU) (January 2008 to December 2020). The MIMIC and internal data sets were each split into training (n = 33 893, n = 28 809), validation (n = 740, n = 7203), and test (n = 1909, n = 9004) sets. A novel transformer-based neural network architecture was trained to diagnose up to 25 conditions using nonimaging data alone, imaging data alone, or multimodal data. Diagnostic performance was assessed using area under the receiver operating characteristic curve (AUC) analysis. Results The MIMIC and internal data sets included 36 542 patients (mean age, 63 years ± 17 [SD]; 20 567 male patients) and 45 016 patients (mean age, 66 years ± 16; 27 577 male patients), respectively. The multimodal model showed improved diagnostic performance for all pathologic conditions. For the MIMIC data set, the mean AUC was 0.77 (95% CI: 0.77, 0.78) when both chest radiographs and clinical parameters were used, compared with 0.70 (95% CI: 0.69, 0.71; P < .001) for only chest radiographs and 0.72 (95% CI: 0.72, 0.73; P < .001) for only clinical parameters. These findings were confirmed on the internal data set. Conclusion A model trained on imaging and nonimaging data outperformed models trained on only one type of data for diagnosing multiple diseases in patients in an ICU setting. © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Kitamura and Topol in this issue.


Deep Learning , Humans , Male , Middle Aged , Aged , Retrospective Studies , Radiography , Databases, Factual , Inpatients
19.
Sci Rep ; 13(1): 14207, 2023 08 30.
Article En | MEDLINE | ID: mdl-37648728

Accurate and automatic segmentation of fibroglandular tissue in breast MRI screening is essential for the quantification of breast density and background parenchymal enhancement. In this retrospective study, we developed and evaluated a transformer-based neural network for breast segmentation (TraBS) in multi-institutional MRI data, and compared its performance to the well established convolutional neural network nnUNet. TraBS and nnUNet were trained and tested on 200 internal and 40 external breast MRI examinations using manual segmentations generated by experienced human readers. Segmentation performance was assessed in terms of the Dice score and the average symmetric surface distance. The Dice score for nnUNet was lower than for TraBS on the internal testset (0.909 ± 0.069 versus 0.916 ± 0.067, P < 0.001) and on the external testset (0.824 ± 0.144 versus 0.864 ± 0.081, P = 0.004). Moreover, the average symmetric surface distance was higher (= worse) for nnUNet than for TraBS on the internal (0.657 ± 2.856 versus 0.548 ± 2.195, P = 0.001) and on the external testset (0.727 ± 0.620 versus 0.584 ± 0.413, P = 0.03). Our study demonstrates that transformer-based networks improve the quality of fibroglandular tissue segmentation in breast MRI compared to convolutional-based models like nnUNet. These findings might help to enhance the accuracy of breast density and parenchymal enhancement quantification in breast MRI screening.


Breast Density , Magnetic Resonance Imaging , Humans , Retrospective Studies , Radiography , Electric Power Supplies
20.
Sci Rep ; 13(1): 10666, 2023 07 01.
Article En | MEDLINE | ID: mdl-37393383

When clinicians assess the prognosis of patients in intensive care, they take imaging and non-imaging data into account. In contrast, many traditional machine learning models rely on only one of these modalities, limiting their potential in medical applications. This work proposes and evaluates a transformer-based neural network as a novel AI architecture that integrates multimodal patient data, i.e., imaging data (chest radiographs) and non-imaging data (clinical data). We evaluate the performance of our model in a retrospective study with 6,125 patients in intensive care. We show that the combined model (area under the receiver operating characteristic curve [AUROC] of 0.863) is superior to the radiographs-only model (AUROC = 0.811, p < 0.001) and the clinical data-only model (AUROC = 0.785, p < 0.001) when tasked with predicting in-hospital survival per patient. Furthermore, we demonstrate that our proposed model is robust in cases where not all (clinical) data points are available.


Critical Care , Diagnostic Imaging , Humans , Retrospective Studies , Area Under Curve , Electric Power Supplies
...