Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 152
Filtrar
1.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38876123

RESUMEN

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Asunto(s)
Inteligencia Artificial , Imagen por Resonancia Magnética , Neoplasias de la Próstata , Radiólogos , Humanos , Masculino , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Anciano , Estudios Retrospectivos , Persona de Mediana Edad , Clasificación del Tumor , Países Bajos , Curva ROC
2.
Insights Imaging ; 15(1): 124, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38825600

RESUMEN

OBJECTIVES: Achieving a consensus on a definition for different aspects of radiomics workflows to support their translation into clinical usage. Furthermore, to assess the perspective of experts on important challenges for a successful clinical workflow implementation. MATERIALS AND METHODS: The consensus was achieved by a multi-stage process. Stage 1 comprised a definition screening, a retrospective analysis with semantic mapping of terms found in 22 workflow definitions, and the compilation of an initial baseline definition. Stages 2 and 3 consisted of a Delphi process with over 45 experts hailing from sites participating in the German Research Foundation (DFG) Priority Program 2177. Stage 2 aimed to achieve a broad consensus for a definition proposal, while stage 3 identified the importance of translational challenges. RESULTS: Workflow definitions from 22 publications (published 2012-2020) were analyzed. Sixty-nine definition terms were extracted, mapped, and semantic ambiguities (e.g., homonymous and synonymous terms) were identified and resolved. The consensus definition was developed via a Delphi process. The final definition comprising seven phases and 37 aspects reached a high overall consensus (> 89% of experts "agree" or "strongly agree"). Two aspects reached no strong consensus. In addition, the Delphi process identified and characterized from the participating experts' perspective the ten most important challenges in radiomics workflows. CONCLUSION: To overcome semantic inconsistencies between existing definitions and offer a well-defined, broad, referenceable terminology, a consensus workflow definition for radiomics-based setups and a terms mapping to existing literature was compiled. Moreover, the most relevant challenges towards clinical application were characterized. CRITICAL RELEVANCE STATEMENT: Lack of standardization represents one major obstacle to successful clinical translation of radiomics. Here, we report a consensus workflow definition on different aspects of radiomics studies and highlight important challenges to advance the clinical adoption of radiomics. KEY POINTS: Published radiomics workflow terminologies are inconsistent, hindering standardization and translation. A consensus radiomics workflow definition proposal with high agreement was developed. Publicly available result resources for further exploitation by the scientific community.

3.
Diagnostics (Basel) ; 14(12)2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38928716

RESUMEN

PURPOSE: To assess the feasibility and diagnostic accuracy of MRI-derived 3D volumetry of lower lumbar vertebrae and dural sac segments using shape-based machine learning for the detection of Marfan syndrome (MFS) compared with dural sac diameter ratios (the current clinical standard). MATERIALS AND METHODS: The final study sample was 144 patients being evaluated for MFS from 01/2012 to 12/2016, of whom 81 were non-MFS patients (46 [67%] female, 36 ± 16 years) and 63 were MFS patients (36 [57%] female, 35 ± 11 years) according to the 2010 Revised Ghent Nosology. All patients underwent 1.5T MRI with isotropic 1 × 1 × 1 mm3 3D T2-weighted acquisition of the lumbosacral spine. Segmentation and quantification of vertebral bodies L3-L5 and dural sac segments L3-S1 were performed using a shape-based machine learning algorithm. For comparison with the current clinical standard, anteroposterior diameters of vertebral bodies and dural sac were measured. Ratios between dural sac volume/diameter at the respective level and vertebral body volume/diameter were calculated. RESULTS: Three-dimensional volumetry revealed larger dural sac volumes (p < 0.001) and volume ratios (p < 0.001) at L3-S1 levels in MFS patients compared with non-MFS patients. For the detection of MFS, 3D volumetry achieved higher AUCs at L3-S1 levels (0.743, 0.752, 0.808, and 0.824) compared with dural sac diameter ratios (0.673, 0.707, 0.791, and 0.848); a significant difference was observed only for L3 (p < 0.001). CONCLUSION: MRI-derived 3D volumetry of the lumbosacral dural sac and vertebral bodies is a feasible method for quantifying dural ectasia using shape-based machine learning. Non-inferior diagnostic accuracy was observed compared with dural sac diameter ratio (the current clinical standard for MFS detection).

4.
Med Image Anal ; 95: 103206, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38776844

RESUMEN

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.


Asunto(s)
Algoritmos , Densidad de la Mama , Neoplasias de la Mama , Mamografía , Humanos , Femenino , Mamografía/métodos , Neoplasias de la Mama/diagnóstico por imagen , Aprendizaje Automático
5.
J Magn Reson Imaging ; 2024 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-38733369

RESUMEN

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

6.
Front Med (Lausanne) ; 11: 1360706, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38495118

RESUMEN

Background: Chronic obstructive pulmonary disease (COPD) poses a substantial global health burden, demanding advanced diagnostic tools for early detection and accurate phenotyping. In this line, this study seeks to enhance COPD characterization on chest computed tomography (CT) by comparing the spatial and quantitative relationships between traditional parametric response mapping (PRM) and a novel self-supervised anomaly detection approach, and to unveil potential additional insights into the dynamic transitional stages of COPD. Methods: Non-contrast inspiratory and expiratory CT of 1,310 never-smoker and GOLD 0 individuals and COPD patients (GOLD 1-4) from the COPDGene dataset were retrospectively evaluated. A novel self-supervised anomaly detection approach was applied to quantify lung abnormalities associated with COPD, as regional deviations. These regional anomaly scores were qualitatively and quantitatively compared, per GOLD class, to PRM volumes (emphysema: PRMEmph, functional small-airway disease: PRMfSAD) and to a Principal Component Analysis (PCA) and Clustering, applied on the self-supervised latent space. Its relationships to pulmonary function tests (PFTs) were also evaluated. Results: Initial t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization of the self-supervised latent space highlighted distinct spatial patterns, revealing clear separations between regions with and without emphysema and air trapping. Four stable clusters were identified among this latent space by the PCA and Cluster Analysis. As the GOLD stage increased, PRMEmph, PRMfSAD, anomaly score, and Cluster 3 volumes exhibited escalating trends, contrasting with a decline in Cluster 2. The patient-wise anomaly scores significantly differed across GOLD stages (p < 0.01), except for never-smokers and GOLD 0 patients. In contrast, PRMEmph, PRMfSAD, and cluster classes showed fewer significant differences. Pearson correlation coefficients revealed moderate anomaly score correlations to PFTs (0.41-0.68), except for the functional residual capacity and smoking duration. The anomaly score was correlated with PRMEmph (r = 0.66, p < 0.01) and PRMfSAD (r = 0.61, p < 0.01). Anomaly scores significantly improved fitting of PRM-adjusted multivariate models for predicting clinical parameters (p < 0.001). Bland-Altman plots revealed that volume agreement between PRM-derived volumes and clusters was not constant across the range of measurements. Conclusion: Our study highlights the synergistic utility of the anomaly detection approach and traditional PRM in capturing the nuanced heterogeneity of COPD. The observed disparities in spatial patterns, cluster dynamics, and correlations with PFTs underscore the distinct - yet complementary - strengths of these methods. Integrating anomaly detection and PRM offers a promising avenue for understanding of COPD pathophysiology, potentially informing more tailored diagnostic and intervention approaches to improve patient outcomes.

7.
iScience ; 27(2): 109023, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38352223

RESUMEN

The preoperative distinction between glioblastoma (GBM) and primary central nervous system lymphoma (PCNSL) can be difficult, even for experts, but is highly relevant. We aimed to develop an easy-to-use algorithm, based on a convolutional neural network (CNN) to preoperatively discern PCNSL from GBM and systematically compare its performance to experienced neurosurgeons and radiologists. To this end, a CNN-based on DenseNet169 was trained with the magnetic resonance (MR)-imaging data of 68 PCNSL and 69 GBM patients and its performance compared to six trained experts on an external test set of 10 PCNSL and 10 GBM. Our neural network predicted PCNSL with an accuracy of 80% and a negative predictive value (NPV) of 0.8, exceeding the accuracy achieved by clinicians (73%, NPV 0.77). Combining expert rating with automated diagnosis in those cases where experts dissented yielded an accuracy of 95%. Our approach has the potential to significantly augment the preoperative radiological diagnosis of PCNSL.

8.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347140

RESUMEN

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Asunto(s)
Inteligencia Artificial
9.
Lancet Oncol ; 25(3): 400-410, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38423052

RESUMEN

BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.


Asunto(s)
Aprendizaje Profundo , Glioblastoma , Humanos , Inteligencia Artificial , Biomarcadores , Estudios de Cohortes , Glioblastoma/diagnóstico por imagen , Imagen por Resonancia Magnética , Estudios Retrospectivos
10.
Nat Commun ; 15(1): 303, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-38182594

RESUMEN

Tract-specific microstructural analysis of the brain's white matter (WM) using diffusion MRI has been a driver for neuroscientific discovery with a wide range of applications. Tractometry enables localized tissue analysis along tracts but relies on bare summary statistics and reduces complex image information along a tract to few scalar values, and so may miss valuable information. This hampers the applicability of tractometry for predictive modelling. Radiomics is a promising method based on the analysis of numerous quantitative image features beyond what can be visually perceived, but has not yet been used for tract-specific analysis of white matter. Here we introduce radiomic tractometry (RadTract) and show that introducing rich radiomics-based feature sets into the world of tractometry enables improved predictive modelling while retaining the localization capability of tractometry. We demonstrate its value in a series of clinical populations, showcasing its performance in diagnosing disease subgroups in different datasets, as well as estimation of demographic and clinical parameters. We propose that RadTract could spark the establishment of a new generation of tract-specific imaging biomarkers with benefits for a range of applications from basic neuroscience to medical research.


Asunto(s)
Investigación Biomédica , Sustancia Blanca , Radiómica , Sustancia Blanca/diagnóstico por imagen , Biomarcadores , Imagen de Difusión por Resonancia Magnética
11.
Radiol Imaging Cancer ; 6(1): e230033, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38180338

RESUMEN

Purpose To describe the design, conduct, and results of the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge. Materials and Methods The BMMR2 computational challenge opened on May 28, 2021, and closed on December 21, 2021. The goal of the challenge was to identify image-based markers derived from multiparametric breast MRI, including diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) MRI, along with clinical data for predicting pathologic complete response (pCR) following neoadjuvant treatment. Data included 573 breast MRI studies from 191 women (mean age [±SD], 48.9 years ± 10.56) in the I-SPY 2/American College of Radiology Imaging Network (ACRIN) 6698 trial (ClinicalTrials.gov: NCT01042379). The challenge cohort was split into training (60%) and test (40%) sets, with teams blinded to test set pCR outcomes. Prediction performance was evaluated by area under the receiver operating characteristic curve (AUC) and compared with the benchmark established from the ACRIN 6698 primary analysis. Results Eight teams submitted final predictions. Entries from three teams had point estimators of AUC that were higher than the benchmark performance (AUC, 0.782 [95% CI: 0.670, 0.893], with AUCs of 0.803 [95% CI: 0.702, 0.904], 0.838 [95% CI: 0.748, 0.928], and 0.840 [95% CI: 0.748, 0.932]). A variety of approaches were used, ranging from extraction of individual features to deep learning and artificial intelligence methods, incorporating DCE and DWI alone or in combination. Conclusion The BMMR2 challenge identified several models with high predictive performance, which may further expand the value of multiparametric breast MRI as an early marker of treatment response. Clinical trial registration no. NCT01042379 Keywords: MRI, Breast, Tumor Response Supplemental material is available for this article. © RSNA, 2024.


Asunto(s)
Neoplasias de la Mama , Imágenes de Resonancia Magnética Multiparamétrica , Femenino , Humanos , Persona de Mediana Edad , Inteligencia Artificial , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/tratamiento farmacológico , Imagen por Resonancia Magnética , Terapia Neoadyuvante , Respuesta Patológica Completa , Adulto
12.
J Magn Reson Imaging ; 59(4): 1409-1422, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37504495

RESUMEN

BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos , Poliésteres
13.
Schizophr Res ; 263: 160-168, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37236889

RESUMEN

The number of magnetic resonance imaging (MRI) studies on neuronal correlates of catatonia has dramatically increased in the last 10 years, but conclusive findings on white matter (WM) tracts alterations underlying catatonic symptoms are still lacking. Therefore, we conduct an interdisciplinary longitudinal MRI study (whiteCAT) with two main objectives: First, we aim to enroll 100 psychiatric patients with and 50 psychiatric patients without catatonia according to ICD-11 who will undergo a deep phenotyping approach with an extensive battery of demographic, psychopathological, psychometric, neuropsychological, instrumental and diffusion MRI assessments at baseline and 12 weeks follow-up. So far, 28 catatonia patients and 40 patients with schizophrenia or other primary psychotic disorders or mood disorders without catatonia have been studied cross-sectionally. 49 out of 68 patients have completed longitudinal assessment, so far. Second, we seek to develop and implement a new method for semi-automatic fiber tract delineation using active learning. By training supportive machine learning algorithms on the fly that are custom tailored to the respective analysis pipeline used to obtain the tractogram as well as the WM tract of interest, we plan to streamline and speed up this tedious and error-prone task while at the same time increasing reproducibility and robustness of the extraction process. The goal is to develop robust neuroimaging biomarkers of symptom severity and therapy outcome based on WM tracts underlying catatonia. If our MRI study is successful, it will be the largest longitudinal study to date that has investigated WM tracts in catatonia patients.


Asunto(s)
Catatonia , Sustancia Blanca , Humanos , Catatonia/diagnóstico , Sustancia Blanca/diagnóstico por imagen , Sustancia Blanca/patología , Estudios Longitudinales , Reproducibilidad de los Resultados , Biomarcadores
14.
ArXiv ; 2024 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-36945687

RESUMEN

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

15.
Adv Mater ; 36(7): e2307160, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37904613

RESUMEN

Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in situ acquisition of photoluminescence (PL) videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning (DL) and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. The study further shows how gained insights can be distilled into actionable recommendations for perovskite thin-film processing, advancing toward industrial-scale solar cell manufacturing. This study demonstrates that XAI methods will play a critical role in accelerating energy materials science.

16.
Eur Radiol ; 34(7): 4379-4392, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38150075

RESUMEN

OBJECTIVES: To quantify regional manifestations related to COPD as anomalies from a modeled distribution of normal-appearing lung on chest CT using a deep learning (DL) approach, and to assess its potential to predict disease severity. MATERIALS AND METHODS: Paired inspiratory/expiratory CT and clinical data from COPDGene and COSYCONET cohort studies were included. COPDGene data served as training/validation/test data sets (N = 3144/786/1310) and COSYCONET as external test set (N = 446). To differentiate low-risk (healthy/minimal disease, [GOLD 0]) from COPD patients (GOLD 1-4), the self-supervised DL model learned semantic information from 50 × 50 × 50 voxel samples from segmented intact lungs. An anomaly detection approach was trained to quantify lung abnormalities related to COPD, as regional deviations. Four supervised DL models were run for comparison. The clinical and radiological predictive power of the proposed anomaly score was assessed using linear mixed effects models (LMM). RESULTS: The proposed approach achieved an area under the curve of 84.3 ± 0.3 (p < 0.001) for COPDGene and 76.3 ± 0.6 (p < 0.001) for COSYCONET, outperforming supervised models even when including only inspiratory CT. Anomaly scores significantly improved fitting of LMM for predicting lung function, health status, and quantitative CT features (emphysema/air trapping; p < 0.001). Higher anomaly scores were significantly associated with exacerbations for both cohorts (p < 0.001) and greater dyspnea scores for COPDGene (p < 0.001). CONCLUSION: Quantifying heterogeneous COPD manifestations as anomaly offers advantages over supervised methods and was found to be predictive for lung function impairment and morphology deterioration. CLINICAL RELEVANCE STATEMENT: Using deep learning, lung manifestations of COPD can be identified as deviations from normal-appearing chest CT and attributed an anomaly score which is consistent with decreased pulmonary function, emphysema, and air trapping. KEY POINTS: • A self-supervised DL anomaly detection method discriminated low-risk individuals and COPD subjects, outperforming classic DL methods on two datasets (COPDGene AUC = 84.3%, COSYCONET AUC = 76.3%). • Our contrastive task exhibits robust performance even without the inclusion of expiratory images, while voxel-based methods demonstrate significant performance enhancement when incorporating expiratory images, in the COPDGene dataset. • Anomaly scores improved the fitting of linear mixed effects models in predicting clinical parameters and imaging alterations (p < 0.001) and were directly associated with clinical outcomes (p < 0.001).


Asunto(s)
Aprendizaje Profundo , Enfermedad Pulmonar Obstructiva Crónica , Índice de Severidad de la Enfermedad , Tomografía Computarizada por Rayos X , Humanos , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico por imagen , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Masculino , Femenino , Tomografía Computarizada por Rayos X/métodos , Persona de Mediana Edad , Anciano , Valor Predictivo de las Pruebas , Pulmón/diagnóstico por imagen , Estudios de Cohortes
17.
Sci Rep ; 13(1): 19805, 2023 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-37957250

RESUMEN

Prostate cancer (PCa) diagnosis on multi-parametric magnetic resonance images (MRI) requires radiologists with a high level of expertise. Misalignments between the MRI sequences can be caused by patient movement, elastic soft-tissue deformations, and imaging artifacts. They further increase the complexity of the task prompting radiologists to interpret the images. Recently, computer-aided diagnosis (CAD) tools have demonstrated potential for PCa diagnosis typically relying on complex co-registration of the input modalities. However, there is no consensus among research groups on whether CAD systems profit from using registration. Furthermore, alternative strategies to handle multi-modal misalignments have not been explored so far. Our study introduces and compares different strategies to cope with image misalignments and evaluates them regarding to their direct effect on diagnostic accuracy of PCa. In addition to established registration algorithms, we propose 'misalignment augmentation' as a concept to increase CAD robustness. As the results demonstrate, misalignment augmentations can not only compensate for a complete lack of registration, but if used in conjunction with registration, also improve the overall performance on an independent test set.


Asunto(s)
Próstata , Neoplasias de la Próstata , Masculino , Humanos , Próstata/diagnóstico por imagen , Próstata/patología , Imagen por Resonancia Magnética/métodos , Diagnóstico por Computador/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Computadores
18.
Nat Commun ; 14(1): 4938, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37582829

RESUMEN

Swift diagnosis and treatment play a decisive role in the clinical outcome of patients with acute ischemic stroke (AIS), and computer-aided diagnosis (CAD) systems can accelerate the underlying diagnostic processes. Here, we developed an artificial neural network (ANN) which allows automated detection of abnormal vessel findings without any a-priori restrictions and in <2 minutes. Pseudo-prospective external validation was performed in consecutive patients with suspected AIS from 4 different hospitals during a 6-month timeframe and demonstrated high sensitivity (≥87%) and negative predictive value (≥93%). Benchmarking against two CE- and FDA-approved software solutions showed significantly higher performance for our ANN with improvements of 25-45% for sensitivity and 4-11% for NPV (p ≤ 0.003 each). We provide an imaging platform ( https://stroke.neuroAI-HD.org ) for online processing of medical imaging data with the developed ANN, including provisions for data crowdsourcing, which will allow continuous refinements and serve as a blueprint to build robust and generalizable AI algorithms.


Asunto(s)
Aprendizaje Profundo , Accidente Cerebrovascular Isquémico , Accidente Cerebrovascular , Humanos , Accidente Cerebrovascular Isquémico/diagnóstico por imagen , Estudios Prospectivos , Angiografía por Tomografía Computarizada/métodos , Accidente Cerebrovascular/diagnóstico por imagen , Angiografía , Estudios Retrospectivos
19.
Eur Radiol ; 33(11): 7463-7476, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37507610

RESUMEN

OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.


Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética , Próstata/diagnóstico por imagen , Próstata/patología , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos
20.
Bone ; 175: 116857, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37487861

RESUMEN

PURPOSE: The presence of bone marrow focal lesions and osteolytic lesions in patients with multiple myeloma (MM) is of high prognostic significance for their individual outcome. It is not known yet why some focal lesions seen in MRI, reflecting localized bone marrow infiltration of myeloma cells, remain non-lytic, whereas others are associated with destruction of mineralized bone. In this study, we analyzed MRI characteristics of manually segmented focal lesions in MM patients to identify possible features that might discriminate lytic and non-lytic lesions. METHOD: The initial cohort included a total of 140 patients with different stages of MM who had undergone both whole-body MRI and whole-body low-dose CT within 30 days, and of which 29 satisfied the inclusion criteria for this study. Focal lesions in MRI and corresponding osteolytic areas in CT were segmented manually. Analysis of the lesions included volume, location and first order texture features analysis. RESULTS: There were significantly more lytic lesions in the axial skeleton than in the appendicular skeleton (p = 0.037). Out of 926 focal lesions in the axial skeleton seen on MRI, 544 (59.3 %) were osteolytic. Analysis of volume and first order texture features showed differences in texture and volume between focal lesions in MRI with and without local bone destruction in CT, but these findings were not statistically significant. CONCLUSIONS: Neither morphological imaging characteristics like size and location nor first order texture features could predict whether focal lesions seen in MRI would exhibit corresponding bone destruction in CT. Studies performing biopsies of such lesions are ongoing.


Asunto(s)
Mieloma Múltiple , Humanos , Mieloma Múltiple/diagnóstico por imagen , Mieloma Múltiple/patología , Médula Ósea/diagnóstico por imagen , Médula Ósea/patología , Tomografía Computarizada por Rayos X , Imagen por Resonancia Magnética , Pronóstico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA