Pesquisa | Biblioteca Virtual em Saúde

1.

Deep learning-based defacing tool for CT angiography: CTA-DEFACE.

Mahmutoglu, Mustafa Ahmed; Rastogi, Aditya; Schell, Marianne; Foltyn-Dumitru, Martha; Baumgartner, Michael; Maier-Hein, Klaus Hermann; Deike-Hofmann, Katerina; Radbruch, Alexander; Bendszus, Martin; Brugnara, Gianluca; Vollmuth, Philipp.

Eur Radiol Exp ; 8(1): 111, 2024 Oct 09.

Artigo em Inglês | MEDLINE | ID: mdl-39382818

RESUMO

The growing use of artificial neural network (ANN) tools for computed tomography angiography (CTA) data analysis underscores the necessity for elevated data protection measures. We aimed to establish an automated defacing pipeline for CTA data. In this retrospective study, CTA data from multi-institutional cohorts were utilized to annotate facemasks (n = 100) and train an ANN model, subsequently tested on an external institution's dataset (n = 50) and compared to a publicly available defacing algorithm. Face detection (MTCNN) and verification (FaceNet) networks were applied to measure the similarity between the original and defaced CTA images. Dice similarity coefficient (DSC), face detection probability, and face similarity measures were calculated to evaluate model performance. The CTA-DEFACE model effectively segmented soft face tissue in CTA data achieving a DSC of 0.94 ± 0.02 (mean ± standard deviation) on the test set. Our model was benchmarked against a publicly available defacing algorithm. After applying face detection and verification networks, our model showed substantially reduced face detection probability (p < 0.001) and similarity to the original CTA image (p < 0.001). The CTA-DEFACE model enabled robust and precise defacing of CTA data. The trained network is publicly accessible at www.github.com/neuroAI-HD/CTA-DEFACE . RELEVANCE STATEMENT: The ANN model CTA-DEFACE, developed for automatic defacing of CT angiography images, achieves significantly lower face detection probabilities and greater dissimilarity from the original images compared to a publicly available model. The algorithm has been externally validated and is publicly accessible. KEY POINTS: The developed ANN model (CTA-DEFACE) automatically generates facemasks for CT angiography images. CTA-DEFACE offers superior deidentification capabilities compared to a publicly available model. By means of graphics processing unit optimization, our model ensures rapid processing of medical images. Our model underwent external validation, underscoring its reliability for real-world application.

Assuntos

Angiografia por Tomografia Computadorizada , Aprendizado Profundo , Angiografia por Tomografia Computadorizada/métodos , Humanos , Estudos Retrospectivos , Redes Neurais de Computação , Masculino , Feminino , Algoritmos

2.

The potential of federated learning for self-configuring medical object detection in heterogeneous data distributions.

Rashidi, Gabriel; Bounias, Dimitrios; Bujotzek, Markus; Mora, Andrés Martínez; Neher, Peter; Maier-Hein, Klaus H.

Sci Rep ; 14(1): 23844, 2024 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-39394440

RESUMO

Medical Object Detection (MOD) is a clinically relevant image processing method that locates structures of interest in radiological image data at object-level using bounding boxes. High-performing MOD models necessitate large datasets accurately reflecting the feature distribution of the corresponding problem domain. However, strict privacy regulations protecting patient data often hinder data consolidation, negatively affecting the performance and generalization of MOD models. Federated Learning (FL) offers a solution by enabling model training while the data remain at its original source institution. While existing FL solutions for medical image classification and segmentation demonstrate promising performance, FL for MOD remains largely unexplored. Motivated by this lack of technical solutions, we present an open-source, self-configuring and task-agnostic federated MOD framework. It integrates the FL framework Flower with nnDetection, a state-of-the-art MOD framework and provides several FL aggregation strategies. Furthermore, we evaluate model performance by creating simulated Independent Identically Distributed (IID) and non-IID scenarios, utilizing the publicly available datasets. Additionally, a detailed analysis of the distributions and characteristics of these datasets offers insights into how they can impact performance. Our framework's implementation demonstrates the feasibility of federated self-configuring MOD in non-IID scenarios and facilitates the development of MOD models trained on large distributed databases.

Assuntos

Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Aprendizado de Máquina

3.

PSFHS challenge report: Pubic symphysis and fetal head segmentation from intrapartum ultrasound images.

Bai, Jieyun; Zhou, Zihao; Ou, Zhanhong; Koehler, Gregor; Stock, Raphael; Maier-Hein, Klaus; Elbatel, Marawan; Martí, Robert; Li, Xiaomeng; Qiu, Yaoyang; Gou, Panjie; Chen, Gongping; Zhao, Lei; Zhang, Jianxun; Dai, Yu; Wang, Fangyijie; Silvestre, Guénolé; Curran, Kathleen; Sun, Hongkun; Xu, Jing; Cai, Pengzhou; Jiang, Lu; Lan, Libin; Ni, Dong; Zhong, Mei; Chen, Gaowen; Campello, Víctor M; Lu, Yaosheng; Lekadir, Karim.

Med Image Anal ; 99: 103353, 2024 Sep 21.

Artigo em Inglês | MEDLINE | ID: mdl-39340971

RESUMO

Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time- and cost-consuming and ii) often yields inconsistent results. The utility of automatic segmentation algorithms for biometry has been proven, though existing results remain suboptimal. To push forward advancements in this area, the Grand Challenge on Pubic Symphysis-Fetal Head Segmentation (PSFHS) was held alongside the 26th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023). This challenge aimed to enhance the development of automatic segmentation algorithms at an international scale, providing the largest dataset to date with 5,101 intrapartum ultrasound images collected from two ultrasound machines across three hospitals from two institutions. The scientific community's enthusiastic participation led to the selection of the top 8 out of 179 entries from 193 registrants in the initial phase to proceed to the competition's second stage. These algorithms have elevated the state-of-the-art in automatic PSFHS from intrapartum ultrasound images. A thorough analysis of the results pinpointed ongoing challenges in the field and outlined recommendations for future work. The top solutions and the complete dataset remain publicly available, fostering further advancements in automatic segmentation and biometry for intrapartum ultrasound imaging.

4.

How do deep-learning models generalize across populations? Cross-ethnicity generalization of COPD detection.

D Almeida, Silvia; Norajitra, Tobias; Lüth, Carsten T; Wald, Tassilo; Weru, Vivienn; Nolden, Marco; Jäger, Paul F; von Stackelberg, Oyunbileg; Heußel, Claus Peter; Weinheimer, Oliver; Biederer, Jürgen; Kauczor, Hans-Ulrich; Maier-Hein, Klaus.

Insights Imaging ; 15(1): 198, 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-39112910

RESUMO

OBJECTIVES: To evaluate the performance and potential biases of deep-learning models in detecting chronic obstructive pulmonary disease (COPD) on chest CT scans across different ethnic groups, specifically non-Hispanic White (NHW) and African American (AA) populations. MATERIALS AND METHODS: Inspiratory chest CT and clinical data from 7549 Genetic epidemiology of COPD individuals (mean age 62 years old, 56-69 interquartile range), including 5240 NHW and 2309 AA individuals, were retrospectively analyzed. Several factors influencing COPD binary classification performance on different ethnic populations were examined: (1) effects of training population: NHW-only, AA-only, balanced set (half NHW, half AA) and the entire set (NHW + AA all); (2) learning strategy: three supervised learning (SL) vs. three self-supervised learning (SSL) methods. Distribution shifts across ethnicity were further assessed for the top-performing methods. RESULTS: The learning strategy significantly influenced model performance, with SSL methods achieving higher performances compared to SL methods (p < 0.001), across all training configurations. Training on balanced datasets containing NHW and AA individuals resulted in improved model performance compared to population-specific datasets. Distribution shifts were found between ethnicities for the same health status, particularly when models were trained on nearest-neighbor contrastive SSL. Training on a balanced dataset resulted in fewer distribution shifts across ethnicity and health status, highlighting its efficacy in reducing biases. CONCLUSION: Our findings demonstrate that utilizing SSL methods and training on large and balanced datasets can enhance COPD detection model performance and reduce biases across diverse ethnic populations. These findings emphasize the importance of equitable AI-driven healthcare solutions for COPD diagnosis. CRITICAL RELEVANCE STATEMENT: Self-supervised learning coupled with balanced datasets significantly improves COPD detection model performance, addressing biases across diverse ethnic populations and emphasizing the crucial role of equitable AI-driven healthcare solutions. KEY POINTS: Self-supervised learning methods outperform supervised learning methods, showing higher AUC values (p < 0.001). Balanced datasets with non-Hispanic White and African American individuals improve model performance. Training on diverse datasets enhances COPD detection accuracy. Ethnically diverse datasets reduce bias in COPD detection models. SimCLR models mitigate biases in COPD detection across ethnicities.

5.

Artificial intelligence and radiologists in prostate cancer detection on MRI (PI-CAI): an international, paired, non-inferiority, confirmatory study.

Saha, Anindo; Bosma, Joeran S; Twilt, Jasper J; van Ginneken, Bram; Bjartell, Anders; Padhani, Anwar R; Bonekamp, David; Villeirs, Geert; Salomon, Georg; Giannarini, Gianluca; Kalpathy-Cramer, Jayashree; Barentsz, Jelle; Maier-Hein, Klaus H; Rusu, Mirabela; Rouvière, Olivier; van den Bergh, Roderick; Panebianco, Valeria; Kasivisvanathan, Veeru; Obuchowski, Nancy A; Yakar, Derya; Elschot, Mattijs; Veltman, Jeroen; Fütterer, Jurgen J; de Rooij, Maarten; Huisman, Henkjan.

Lancet Oncol ; 25(7): 879-887, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38876123

RESUMO

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.

Assuntos

Inteligência Artificial , Imageamento por Ressonância Magnética , Neoplasias da Próstata , Radiologistas , Humanos , Masculino , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Idoso , Estudos Retrospectivos , Pessoa de Meia-Idade , Gradação de Tumores , Países Baixos , Curva ROC

6.

MRI-Derived Dural Sac and Lumbar Vertebrae 3D Volumetry Has Potential for Detection of Marfan Syndrome.

Naas, Omar; Norajitra, Tobias; Lückerath, Christian; Fink, Matthias A; Maier-Hein, Klaus; Kauczor, Hans-Ulrich; Rengier, Fabian.

Diagnostics (Basel) ; 14(12)2024 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-38928716

RESUMO

PURPOSE: To assess the feasibility and diagnostic accuracy of MRI-derived 3D volumetry of lower lumbar vertebrae and dural sac segments using shape-based machine learning for the detection of Marfan syndrome (MFS) compared with dural sac diameter ratios (the current clinical standard). MATERIALS AND METHODS: The final study sample was 144 patients being evaluated for MFS from 01/2012 to 12/2016, of whom 81 were non-MFS patients (46 [67%] female, 36 ± 16 years) and 63 were MFS patients (36 [57%] female, 35 ± 11 years) according to the 2010 Revised Ghent Nosology. All patients underwent 1.5T MRI with isotropic 1 × 1 × 1 mm3 3D T2-weighted acquisition of the lumbosacral spine. Segmentation and quantification of vertebral bodies L3-L5 and dural sac segments L3-S1 were performed using a shape-based machine learning algorithm. For comparison with the current clinical standard, anteroposterior diameters of vertebral bodies and dural sac were measured. Ratios between dural sac volume/diameter at the respective level and vertebral body volume/diameter were calculated. RESULTS: Three-dimensional volumetry revealed larger dural sac volumes (p < 0.001) and volume ratios (p < 0.001) at L3-S1 levels in MFS patients compared with non-MFS patients. For the detection of MFS, 3D volumetry achieved higher AUCs at L3-S1 levels (0.743, 0.752, 0.808, and 0.824) compared with dural sac diameter ratios (0.673, 0.707, 0.791, and 0.848); a significant difference was observed only for L3 (p < 0.001). CONCLUSION: MRI-derived 3D volumetry of the lumbosacral dural sac and vertebral bodies is a feasible method for quantifying dural ectasia using shape-based machine learning. Non-inferior diagnostic accuracy was observed compared with dural sac diameter ratio (the current clinical standard for MFS detection).

7.

Radiomics workflow definition & challenges - German priority program 2177 consensus statement on clinically applied radiomics.

Floca, Ralf; Bohn, Jonas; Haux, Christian; Wiestler, Benedikt; Zöllner, Frank G; Reinke, Annika; Weiß, Jakob; Nolden, Marco; Albert, Steffen; Persigehl, Thorsten; Norajitra, Tobias; Baeßler, Bettina; Dewey, Marc; Braren, Rickmer; Büchert, Martin; Fallenberg, Eva Maria; Galldiks, Norbert; Gerken, Annika; Götz, Michael; Hahn, Horst K; Haubold, Johannes; Haueise, Tobias; Große Hokamp, Nils; Ingrisch, Michael; Iuga, Andra-Iza; Janoschke, Marco; Jung, Matthias; Kiefer, Lena Sophie; Lohmann, Philipp; Machann, Jürgen; Moltz, Jan Hendrik; Nattenmüller, Johanna; Nonnenmacher, Tobias; Oerther, Benedict; Othman, Ahmed E; Peisen, Felix; Schick, Fritz; Umutlu, Lale; Wichtmann, Barbara D; Zhao, Wenzhao; Caspers, Svenja; Schlemmer, Heinz-Peter; Schlett, Christopher L; Maier-Hein, Klaus; Bamberg, Fabian.

Insights Imaging ; 15(1): 124, 2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38825600

RESUMO

OBJECTIVES: Achieving a consensus on a definition for different aspects of radiomics workflows to support their translation into clinical usage. Furthermore, to assess the perspective of experts on important challenges for a successful clinical workflow implementation. MATERIALS AND METHODS: The consensus was achieved by a multi-stage process. Stage 1 comprised a definition screening, a retrospective analysis with semantic mapping of terms found in 22 workflow definitions, and the compilation of an initial baseline definition. Stages 2 and 3 consisted of a Delphi process with over 45 experts hailing from sites participating in the German Research Foundation (DFG) Priority Program 2177. Stage 2 aimed to achieve a broad consensus for a definition proposal, while stage 3 identified the importance of translational challenges. RESULTS: Workflow definitions from 22 publications (published 2012-2020) were analyzed. Sixty-nine definition terms were extracted, mapped, and semantic ambiguities (e.g., homonymous and synonymous terms) were identified and resolved. The consensus definition was developed via a Delphi process. The final definition comprising seven phases and 37 aspects reached a high overall consensus (> 89% of experts "agree" or "strongly agree"). Two aspects reached no strong consensus. In addition, the Delphi process identified and characterized from the participating experts' perspective the ten most important challenges in radiomics workflows. CONCLUSION: To overcome semantic inconsistencies between existing definitions and offer a well-defined, broad, referenceable terminology, a consensus workflow definition for radiomics-based setups and a terms mapping to existing literature was compiled. Moreover, the most relevant challenges towards clinical application were characterized. CRITICAL RELEVANCE STATEMENT: Lack of standardization represents one major obstacle to successful clinical translation of radiomics. Here, we report a consensus workflow definition on different aspects of radiomics studies and highlight important challenges to advance the clinical adoption of radiomics. KEY POINTS: Published radiomics workflow terminologies are inconsistent, hindering standardization and translation. A consensus radiomics workflow definition proposal with high agreement was developed. Publicly available result resources for further exploitation by the scientific community.

8.

Fair evaluation of federated learning algorithms for automated breast density classification: The results of the 2022 ACR-NCI-NVIDIA federated learning challenge.

Schmidt, Kendall; Bearce, Benjamin; Chang, Ken; Coombs, Laura; Farahani, Keyvan; Elbatel, Marawan; Mouheb, Kaouther; Marti, Robert; Zhang, Ruipeng; Zhang, Yao; Wang, Yanfeng; Hu, Yaojun; Ying, Haochao; Xu, Yuyang; Testagrose, Conrad; Demirer, Mutlu; Gupta, Vikash; Akünal, Ünal; Bujotzek, Markus; Maier-Hein, Klaus H; Qin, Yi; Li, Xiaomeng; Kalpathy-Cramer, Jayashree; Roth, Holger R.

Med Image Anal ; 95: 103206, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38776844

RESUMO

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.

Assuntos

Algoritmos , Densidade da Mama , Neoplasias da Mama , Mamografia , Humanos , Feminino , Mamografia/métodos , Neoplasias da Mama/diagnóstico por imagem , Aprendizado de Máquina

9.

Reproducible Radiomics Features from Multi-MRI-Scanner Test-Retest-Study: Influence on Performance and Generalizability of Models.

Wennmann, Markus; Rotkopf, Lukas T; Bauer, Fabian; Hielscher, Thomas; Kächele, Jessica; Mai, Elias K; Weinhold, Niels; Raab, Marc-Steffen; Goldschmidt, Hartmut; Weber, Tim F; Schlemmer, Heinz-Peter; Delorme, Stefan; Maier-Hein, Klaus; Neher, Peter.

J Magn Reson Imaging ; 2024 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-38733369

RESUMO

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

10.

Capturing COPD heterogeneity: anomaly detection and parametric response mapping comparison for phenotyping on chest computed tomography.

Almeida, Silvia D; Norajitra, Tobias; Lüth, Carsten T; Wald, Tassilo; Weru, Vivienn; Nolden, Marco; Jäger, Paul F; von Stackelberg, Oyunbileg; Heußel, Claus Peter; Weinheimer, Oliver; Biederer, Jürgen; Kauczor, Hans-Ulrich; Maier-Hein, Klaus.

Front Med (Lausanne) ; 11: 1360706, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38495118

RESUMO

Background: Chronic obstructive pulmonary disease (COPD) poses a substantial global health burden, demanding advanced diagnostic tools for early detection and accurate phenotyping. In this line, this study seeks to enhance COPD characterization on chest computed tomography (CT) by comparing the spatial and quantitative relationships between traditional parametric response mapping (PRM) and a novel self-supervised anomaly detection approach, and to unveil potential additional insights into the dynamic transitional stages of COPD. Methods: Non-contrast inspiratory and expiratory CT of 1,310 never-smoker and GOLD 0 individuals and COPD patients (GOLD 1-4) from the COPDGene dataset were retrospectively evaluated. A novel self-supervised anomaly detection approach was applied to quantify lung abnormalities associated with COPD, as regional deviations. These regional anomaly scores were qualitatively and quantitatively compared, per GOLD class, to PRM volumes (emphysema: PRMEmph, functional small-airway disease: PRMfSAD) and to a Principal Component Analysis (PCA) and Clustering, applied on the self-supervised latent space. Its relationships to pulmonary function tests (PFTs) were also evaluated. Results: Initial t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization of the self-supervised latent space highlighted distinct spatial patterns, revealing clear separations between regions with and without emphysema and air trapping. Four stable clusters were identified among this latent space by the PCA and Cluster Analysis. As the GOLD stage increased, PRMEmph, PRMfSAD, anomaly score, and Cluster 3 volumes exhibited escalating trends, contrasting with a decline in Cluster 2. The patient-wise anomaly scores significantly differed across GOLD stages (p < 0.01), except for never-smokers and GOLD 0 patients. In contrast, PRMEmph, PRMfSAD, and cluster classes showed fewer significant differences. Pearson correlation coefficients revealed moderate anomaly score correlations to PFTs (0.41-0.68), except for the functional residual capacity and smoking duration. The anomaly score was correlated with PRMEmph (r = 0.66, p < 0.01) and PRMfSAD (r = 0.61, p < 0.01). Anomaly scores significantly improved fitting of PRM-adjusted multivariate models for predicting clinical parameters (p < 0.001). Bland-Altman plots revealed that volume agreement between PRM-derived volumes and clusters was not constant across the range of measurements. Conclusion: Our study highlights the synergistic utility of the anomaly detection approach and traditional PRM in capturing the nuanced heterogeneity of COPD. The observed disparities in spatial patterns, cluster dynamics, and correlations with PFTs underscore the distinct - yet complementary - strengths of these methods. Integrating anomaly detection and PRM offers a promising avenue for understanding of COPD pathophysiology, potentially informing more tailored diagnostic and intervention approaches to improve patient outcomes.

11.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus; Martel, Anne L; Meijering, Erik; Menze, Bjoern; Moons, Karel G M; Müller, Henning.

Nat Methods ; 21(2): 182-194, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38347140

RESUMO

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.

Assuntos

Inteligência Artificial

12.

Deep learning aided preoperative diagnosis of primary central nervous system lymphoma.

Naser, Paul Vincent; Maurer, Miriam Cindy; Fischer, Maximilian; Karimian-Jazi, Kianush; Ben-Salah, Chiraz; Bajwa, Awais Akbar; Jakobs, Martin; Jungk, Christine; Jesser, Jessica; Bendszus, Martin; Maier-Hein, Klaus; Krieg, Sandro M; Neher, Peter; Neumann, Jan-Oliver.

iScience ; 27(2): 109023, 2024 Feb 16.

Artigo em Inglês | MEDLINE | ID: mdl-38352223

RESUMO

The preoperative distinction between glioblastoma (GBM) and primary central nervous system lymphoma (PCNSL) can be difficult, even for experts, but is highly relevant. We aimed to develop an easy-to-use algorithm, based on a convolutional neural network (CNN) to preoperatively discern PCNSL from GBM and systematically compare its performance to experienced neurosurgeons and radiologists. To this end, a CNN-based on DenseNet169 was trained with the magnetic resonance (MR)-imaging data of 68 PCNSL and 69 GBM patients and its performance compared to six trained experts on an external test set of 10 PCNSL and 10 GBM. Our neural network predicted PCNSL with an accuracy of 80% and a negative predictive value (NPV) of 0.8, exceeding the accuracy achieved by clinicians (73%, NPV 0.77). Combining expert rating with automated diagnosis in those cases where experts dissented yielded an accuracy of 95%. Our approach has the potential to significantly augment the preoperative radiological diagnosis of PCNSL.

13.

Deep-learning-based reconstruction of undersampled MRI to reduce scan times: a multicentre, retrospective, cohort study.

Rastogi, Aditya; Brugnara, Gianluca; Foltyn-Dumitru, Martha; Mahmutoglu, Mustafa Ahmed; Preetha, Chandrakanth J; Kobler, Erich; Pflüger, Irada; Schell, Marianne; Deike-Hofmann, Katerina; Kessler, Tobias; van den Bent, Martin J; Idbaih, Ahmed; Platten, Michael; Brandes, Alba A; Nabors, Burt; Stupp, Roger; Bernhardt, Denise; Debus, Jürgen; Abdollahi, Amir; Gorlia, Thierry; Tonn, Jörg-Christian; Weller, Michael; Maier-Hein, Klaus H; Radbruch, Alexander; Wick, Wolfgang; Bendszus, Martin; Meredig, Hagen; Kurz, Felix T; Vollmuth, Philipp.

Lancet Oncol ; 25(3): 400-410, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38423052

RESUMO

BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.

Assuntos

Aprendizado Profundo , Glioblastoma , Humanos , Inteligência Artificial , Biomarcadores , Estudos de Coortes , Glioblastoma/diagnóstico por imagem , Imageamento por Ressonância Magnética , Estudos Retrospectivos

14.

Breast Multiparametric MRI for Prediction of Neoadjuvant Chemotherapy Response in Breast Cancer: The BMMR2 Challenge.

Li, Wen; Partridge, Savannah C; Newitt, David C; Steingrimsson, Jon; Marques, Helga S; Bolan, Patrick J; Hirano, Michael; Bearce, Benjamin Aaron; Kalpathy-Cramer, Jayashree; Boss, Michael A; Teng, Xinzhi; Zhang, Jiang; Cai, Jing; Kontos, Despina; Cohen, Eric A; Mankowski, Walter C; Liu, Michael; Ha, Richard; Pellicer-Valero, Oscar J; Maier-Hein, Klaus; Rabinovici-Cohen, Simona; Tlusty, Tal; Ozery-Flato, Michal; Parekh, Vishwa S; Jacobs, Michael A; Yan, Ran; Sung, Kyunghyun; Kazerouni, Anum S; DiCarlo, Julie C; Yankeelov, Thomas E; Chenevert, Thomas L; Hylton, Nola M.

Radiol Imaging Cancer ; 6(1): e230033, 2024 01.

Artigo em Inglês | MEDLINE | ID: mdl-38180338

RESUMO

Purpose To describe the design, conduct, and results of the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge. Materials and Methods The BMMR2 computational challenge opened on May 28, 2021, and closed on December 21, 2021. The goal of the challenge was to identify image-based markers derived from multiparametric breast MRI, including diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) MRI, along with clinical data for predicting pathologic complete response (pCR) following neoadjuvant treatment. Data included 573 breast MRI studies from 191 women (mean age [±SD], 48.9 years ± 10.56) in the I-SPY 2/American College of Radiology Imaging Network (ACRIN) 6698 trial (ClinicalTrials.gov: NCT01042379). The challenge cohort was split into training (60%) and test (40%) sets, with teams blinded to test set pCR outcomes. Prediction performance was evaluated by area under the receiver operating characteristic curve (AUC) and compared with the benchmark established from the ACRIN 6698 primary analysis. Results Eight teams submitted final predictions. Entries from three teams had point estimators of AUC that were higher than the benchmark performance (AUC, 0.782 [95% CI: 0.670, 0.893], with AUCs of 0.803 [95% CI: 0.702, 0.904], 0.838 [95% CI: 0.748, 0.928], and 0.840 [95% CI: 0.748, 0.932]). A variety of approaches were used, ranging from extraction of individual features to deep learning and artificial intelligence methods, incorporating DCE and DWI alone or in combination. Conclusion The BMMR2 challenge identified several models with high predictive performance, which may further expand the value of multiparametric breast MRI as an early marker of treatment response. Clinical trial registration no. NCT01042379 Keywords: MRI, Breast, Tumor Response Supplemental material is available for this article. © RSNA, 2024.

Assuntos

Neoplasias da Mama , Imageamento por Ressonância Magnética Multiparamétrica , Feminino , Humanos , Pessoa de Meia-Idade , Inteligência Artificial , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/tratamento farmacológico , Imageamento por Ressonância Magnética , Terapia Neoadjuvante , Resposta Patológica Completa , Adulto

15.

Radiomic tractometry reveals tract-specific imaging biomarkers in white matter.

Neher, Peter; Hirjak, Dusan; Maier-Hein, Klaus.

Nat Commun ; 15(1): 303, 2024 Jan 05.

Artigo em Inglês | MEDLINE | ID: mdl-38182594

RESUMO

Tract-specific microstructural analysis of the brain's white matter (WM) using diffusion MRI has been a driver for neuroscientific discovery with a wide range of applications. Tractometry enables localized tissue analysis along tracts but relies on bare summary statistics and reduces complex image information along a tract to few scalar values, and so may miss valuable information. This hampers the applicability of tractometry for predictive modelling. Radiomics is a promising method based on the analysis of numerous quantitative image features beyond what can be visually perceived, but has not yet been used for tract-specific analysis of white matter. Here we introduce radiomic tractometry (RadTract) and show that introducing rich radiomics-based feature sets into the world of tractometry enables improved predictive modelling while retaining the localization capability of tractometry. We demonstrate its value in a series of clinical populations, showcasing its performance in diagnosing disease subgroups in different datasets, as well as estimation of demographic and clinical parameters. We propose that RadTract could spark the establishment of a new generation of tract-specific imaging biomarkers with benefits for a range of applications from basic neuroscience to medical research.

Assuntos

Pesquisa Biomédica , Substância Branca , Radiômica , Substância Branca/diagnóstico por imagem , Biomarcadores , Imagem de Difusão por Ressonância Magnética

16.

Discovering Process Dynamics for Scalable Perovskite Solar Cell Manufacturing with Explainable AI.

Klein, Lukas; Ziegler, Sebastian; Laufer, Felix; Debus, Charlotte; Götz, Markus; Maier-Hein, Klaus; Paetzold, Ulrich W; Isensee, Fabian; Jäger, Paul F.

Adv Mater ; 36(7): e2307160, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37904613

RESUMO

Large-area processing of perovskite semiconductor thin-films is complex and evokes unexplained variance in quality, posing a major hurdle for the commercialization of perovskite photovoltaics. Advances in scalable fabrication processes are currently limited to gradual and arbitrary trial-and-error procedures. While the in situ acquisition of photoluminescence (PL) videos has the potential to reveal important variations in the thin-film formation process, the high dimensionality of the data quickly surpasses the limits of human analysis. In response, this study leverages deep learning (DL) and explainable artificial intelligence (XAI) to discover relationships between sensor information acquired during the perovskite thin-film formation process and the resulting solar cell performance indicators, while rendering these relationships humanly understandable. The study further shows how gained insights can be distilled into actionable recommendations for perovskite thin-film processing, advancing toward industrial-scale solar cell manufacturing. This study demonstrates that XAI methods will play a critical role in accelerating energy materials science.

17.

Weakly Supervised MRI Slice-Level Deep Learning Classification of Prostate Cancer Approximates Full Voxel- and Slice-Level Annotation: Effect of Increasing Training Set Size.

Weißer, Cedric; Netzer, Nils; Görtz, Magdalena; Schütz, Viktoria; Hielscher, Thomas; Schwab, Constantin; Hohenfellner, Markus; Schlemmer, Heinz-Peter; Maier-Hein, Klaus H; Bonekamp, David.

J Magn Reson Imaging ; 59(4): 1409-1422, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37504495

RESUMO

BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.

Assuntos

Aprendizado Profundo , Neoplasias da Próstata , Masculino , Humanos , Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Estudos Retrospectivos , Poliésteres

18.

Microstructural white matter biomarkers of symptom severity and therapy outcome in catatonia: Rationale, study design and preliminary clinical data of the whiteCAT study.

Hirjak, Dusan; Brandt, Geva A; Peretzke, Robin; Fritze, Stefan; Meyer-Lindenberg, Andreas; Maier-Hein, Klaus H; Neher, Peter F.

Schizophr Res ; 263: 160-168, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37236889

RESUMO

The number of magnetic resonance imaging (MRI) studies on neuronal correlates of catatonia has dramatically increased in the last 10 years, but conclusive findings on white matter (WM) tracts alterations underlying catatonic symptoms are still lacking. Therefore, we conduct an interdisciplinary longitudinal MRI study (whiteCAT) with two main objectives: First, we aim to enroll 100 psychiatric patients with and 50 psychiatric patients without catatonia according to ICD-11 who will undergo a deep phenotyping approach with an extensive battery of demographic, psychopathological, psychometric, neuropsychological, instrumental and diffusion MRI assessments at baseline and 12 weeks follow-up. So far, 28 catatonia patients and 40 patients with schizophrenia or other primary psychotic disorders or mood disorders without catatonia have been studied cross-sectionally. 49 out of 68 patients have completed longitudinal assessment, so far. Second, we seek to develop and implement a new method for semi-automatic fiber tract delineation using active learning. By training supportive machine learning algorithms on the fly that are custom tailored to the respective analysis pipeline used to obtain the tractogram as well as the WM tract of interest, we plan to streamline and speed up this tedious and error-prone task while at the same time increasing reproducibility and robustness of the extraction process. The goal is to develop robust neuroimaging biomarkers of symptom severity and therapy outcome based on WM tracts underlying catatonia. If our MRI study is successful, it will be the largest longitudinal study to date that has investigated WM tracts in catatonia patients.

Assuntos

Catatonia , Substância Branca , Humanos , Catatonia/diagnóstico , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Estudos Longitudinais , Reprodutibilidade dos Testes , Biomarcadores

19.

Understanding metric-related pitfalls in image analysis validation.

Reinke, Annika; Tizabi, Minu D; Baumgartner, Michael; Eisenmann, Matthias; Heckmann-Nötzel, Doreen; Kavur, A Emre; Rädsch, Tim; Sudre, Carole H; Acion, Laura; Antonelli, Michela; Arbel, Tal; Bakas, Spyridon; Benis, Arriel; Blaschko, Matthew; Buettner, Florian; Cardoso, M Jorge; Cheplygina, Veronika; Chen, Jianxu; Christodoulou, Evangelia; Cimini, Beth A; Collins, Gary S; Farahani, Keyvan; Ferrer, Luciana; Galdran, Adrian; van Ginneken, Bram; Glocker, Ben; Godau, Patrick; Haase, Robert; Hashimoto, Daniel A; Hoffman, Michael M; Huisman, Merel; Isensee, Fabian; Jannin, Pierre; Kahn, Charles E; Kainmueller, Dagmar; Kainz, Bernhard; Karargyris, Alexandros; Karthikesalingam, Alan; Kenngott, Hannes; Kleesiek, Jens; Kofler, Florian; Kooi, Thijs; Kopp-Schneider, Annette; Kozubek, Michal; Kreshuk, Anna; Kurc, Tahsin; Landman, Bennett A; Litjens, Geert; Madani, Amin; Maier-Hein, Klaus.

ArXiv ; 2024 Feb 23.

Artigo em Inglês | MEDLINE | ID: mdl-36945687

RESUMO

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.

20.

Prediction of disease severity in COPD: a deep learning approach for anomaly-based quantitative assessment of chest CT.

Almeida, Silvia D; Norajitra, Tobias; Lüth, Carsten T; Wald, Tassilo; Weru, Vivienn; Nolden, Marco; Jäger, Paul F; von Stackelberg, Oyunbileg; Heußel, Claus Peter; Weinheimer, Oliver; Biederer, Jürgen; Kauczor, Hans-Ulrich; Maier-Hein, Klaus.

Eur Radiol ; 34(7): 4379-4392, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38150075

RESUMO

OBJECTIVES: To quantify regional manifestations related to COPD as anomalies from a modeled distribution of normal-appearing lung on chest CT using a deep learning (DL) approach, and to assess its potential to predict disease severity. MATERIALS AND METHODS: Paired inspiratory/expiratory CT and clinical data from COPDGene and COSYCONET cohort studies were included. COPDGene data served as training/validation/test data sets (N = 3144/786/1310) and COSYCONET as external test set (N = 446). To differentiate low-risk (healthy/minimal disease, [GOLD 0]) from COPD patients (GOLD 1-4), the self-supervised DL model learned semantic information from 50 × 50 × 50 voxel samples from segmented intact lungs. An anomaly detection approach was trained to quantify lung abnormalities related to COPD, as regional deviations. Four supervised DL models were run for comparison. The clinical and radiological predictive power of the proposed anomaly score was assessed using linear mixed effects models (LMM). RESULTS: The proposed approach achieved an area under the curve of 84.3 ± 0.3 (p < 0.001) for COPDGene and 76.3 ± 0.6 (p < 0.001) for COSYCONET, outperforming supervised models even when including only inspiratory CT. Anomaly scores significantly improved fitting of LMM for predicting lung function, health status, and quantitative CT features (emphysema/air trapping; p < 0.001). Higher anomaly scores were significantly associated with exacerbations for both cohorts (p < 0.001) and greater dyspnea scores for COPDGene (p < 0.001). CONCLUSION: Quantifying heterogeneous COPD manifestations as anomaly offers advantages over supervised methods and was found to be predictive for lung function impairment and morphology deterioration. CLINICAL RELEVANCE STATEMENT: Using deep learning, lung manifestations of COPD can be identified as deviations from normal-appearing chest CT and attributed an anomaly score which is consistent with decreased pulmonary function, emphysema, and air trapping. KEY POINTS: â¢ A self-supervised DL anomaly detection method discriminated low-risk individuals and COPD subjects, outperforming classic DL methods on two datasets (COPDGene AUC = 84.3%, COSYCONET AUC = 76.3%). â¢ Our contrastive task exhibits robust performance even without the inclusion of expiratory images, while voxel-based methods demonstrate significant performance enhancement when incorporating expiratory images, in the COPDGene dataset. â¢ Anomaly scores improved the fitting of linear mixed effects models in predicting clinical parameters and imaging alterations (p < 0.001) and were directly associated with clinical outcomes (p < 0.001).

Assuntos

Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Índice de Gravidade de Doença , Tomografia Computadorizada por Raios X , Humanos , Doença Pulmonar Obstrutiva Crônica/diagnóstico por imagem , Doença Pulmonar Obstrutiva Crônica/fisiopatologia , Masculino , Feminino , Tomografia Computadorizada por Raios X/métodos , Pessoa de Meia-Idade , Idoso , Valor Preditivo dos Testes , Pulmão/diagnóstico por imagem , Estudos de Coortes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA