RESUMEN
The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.
Asunto(s)
Benchmarking , Rastreo Celular , Rastreo Celular/métodos , Aprendizaje Automático , AlgoritmosRESUMEN
BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.
Asunto(s)
Inteligencia Artificial , Imagen por Resonancia Magnética , Neoplasias de la Próstata , Radiólogos , Humanos , Masculino , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Anciano , Estudios Retrospectivos , Persona de Mediana Edad , Clasificación del Tumor , Países Bajos , Curva ROCRESUMEN
BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.
Asunto(s)
Aprendizaje Profundo , Glioblastoma , Humanos , Inteligencia Artificial , Biomarcadores , Estudios de Cohortes , Glioblastoma/diagnóstico por imagen , Imagen por Resonancia Magnética , Estudios RetrospectivosRESUMEN
Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la ComputaciónRESUMEN
BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.
Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos , PoliésteresRESUMEN
OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: ⢠A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. ⢠Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. ⢠Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.
Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética , Próstata/diagnóstico por imagen , Próstata/patología , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios RetrospectivosRESUMEN
Accurate detection and quantification of unruptured intracranial aneurysms (UIAs) is important for rupture risk assessment and to allow an informed treatment decision to be made. Currently, 2D manual measures used to assess UIAs on Time-of-Flight magnetic resonance angiographies (TOF-MRAs) lack 3D information and there is substantial inter-observer variability for both aneurysm detection and assessment of aneurysm size and growth. 3D measures could be helpful to improve aneurysm detection and quantification but are time-consuming and would therefore benefit from a reliable automatic UIA detection and segmentation method. The Aneurysm Detection and segMentation (ADAM) challenge was organised in which methods for automatic UIA detection and segmentation were developed and submitted to be evaluated on a diverse clinical TOF-MRA dataset. A training set (113 cases with a total of 129 UIAs) was released, each case including a TOF-MRA, a structural MR image (T1, T2 or FLAIR), annotation of any present UIA(s) and the centre voxel of the UIA(s). A test set of 141 cases (with 153 UIAs) was used for evaluation. Two tasks were proposed: (1) detection and (2) segmentation of UIAs on TOF-MRAs. Teams developed and submitted containerised methods to be evaluated on the test set. Task 1 was evaluated using metrics of sensitivity and false positive count. Task 2 was evaluated using dice similarity coefficient, modified hausdorff distance (95th percentile) and volumetric similarity. For each task, a ranking was made based on the average of the metrics. In total, eleven teams participated in task 1 and nine of those teams participated in task 2. Task 1 was won by a method specifically designed for the detection task (i.e. not participating in task 2). Based on segmentation metrics, the top two methods for task 2 performed statistically significantly better than all other methods. The detection performance of the top-ranking methods was comparable to visual inspection for larger aneurysms. Segmentation performance of the top ranking method, after selection of true UIAs, was similar to interobserver performance. The ADAM challenge remains open for future submissions and improved submissions, with a live leaderboard to provide benchmarking for method developments at https://adam.isi.uu.nl/.
Asunto(s)
Angiografía Cerebral/métodos , Aneurisma Intracraneal/diagnóstico por imagen , Angiografía por Resonancia Magnética/métodos , Conjuntos de Datos como Asunto , Evaluación Educacional , Humanos , Imagen por Resonancia Magnética , Distribución Aleatoria , Medición de RiesgoRESUMEN
OBJECTIVES: To simulate clinical deployment, evaluate performance, and establish quality assurance of a deep learning algorithm (U-Net) for detection, localization, and segmentation of clinically significant prostate cancer (sPC), ISUP grade group ≥ 2, using bi-parametric MRI. METHODS: In 2017, 284 consecutive men in active surveillance, biopsy-naïve or pre-biopsied, received targeted and extended systematic MRI/transrectal US-fusion biopsy, after examination on a single MRI scanner (3 T). A prospective adjustment scheme was evaluated comparing the performance of the Prostate Imaging Reporting and Data System (PI-RADS) and U-Net using sensitivity, specificity, predictive values, and the Dice coefficient. RESULTS: In the 259 eligible men (median 64 [IQR 61-72] years), PI-RADS had a sensitivity of 98% [106/108]/84% [91/108] with a specificity of 17% [25/151]/58% [88/151], for thresholds at ≥ 3/≥ 4 respectively. U-Net using dynamic threshold adjustment had a sensitivity of 99% [107/108]/83% [90/108] (p > 0.99/> 0.99) with a specificity of 24% [36/151]/55% [83/151] (p > 0.99/> 0.99) for probability thresholds d3 and d4 emulating PI-RADS ≥ 3 and ≥ 4 decisions respectively, not statistically different from PI-RADS. Co-occurrence of a radiological PI-RADS ≥ 4 examination and U-Net ≥ d3 assessment significantly improved the positive predictive value from 59 to 63% (p = 0.03), on a per-patient basis. CONCLUSIONS: U-Net has similar performance to PI-RADS in simulated continued clinical use. Regular quality assurance should be implemented to ensure desired performance. KEY POINTS: ⢠U-Net maintained similar diagnostic performance compared to radiological assessment of PI-RADS ≥ 4 when applied in a simulated clinical deployment. ⢠Application of our proposed prospective dynamic calibration method successfully adjusted U-Net performance within acceptable limits of the PI-RADS reference over time, while not being limited to PI-RADS as a reference. ⢠Simultaneous detection by U-Net and radiological assessment significantly improved the positive predictive value on a per-patient and per-lesion basis, while the negative predictive value remained unchanged.
Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Humanos , Biopsia Guiada por Imagen , Imagen por Resonancia Magnética , Masculino , Estudios Prospectivos , Neoplasias de la Próstata/diagnóstico por imagenRESUMEN
OBJECTIVE: Evaluation of software tools for segmentation, quantification, and characterization of fibrotic pulmonary parenchyma changes will strengthen the role of CT as biomarkers of disease extent, evolution, and response to therapy in idiopathic pulmonary fibrosis (IPF) patients. METHODS: 418 nonenhanced thin-section MDCTs of 127 IPF patients and 78 MDCTs of 78 healthy individuals were analyzed through 3 fully automated, completely different software tools: YACTA, LUFIT, and IMBIO. The agreement between YACTA and LUFIT on segmented lung volume and 80th (reflecting fibrosis) and 40th (reflecting ground-glass opacity) percentile of the lung density histogram was analyzed using Bland-Altman plots. The fibrosis and ground-glass opacity segmented by IMBIO (lung texture analysis software tool) were included in specific regression analyses. RESULTS: In the IPF-group, LUFIT outperformed YACTA by segmenting more lung volume (mean difference 242 mL, 95% limits of agreement -54 to 539 mL), as well as quantifying higher 80th (76 HU, -6 to 158 HU) and 40th percentiles (9 HU, -73 to 90 HU). No relevant differences were revealed in the control group. The 80th/40th percentile as quantified by LUFIT correlated positively with the percentage of fibrosis/ground-glass opacity calculated by IMBIO (r = 0.78/r = 0.92). CONCLUSIONS: In terms of segmentation of pulmonary fibrosis, LUFIT as a shape model-based segmentation software tool is superior to the threshold-based YACTA, tool, since the density of (severe) fibrosis is similar to that of the surrounding soft tissues. Therefore, shape modeling as used in LUFIT may serve as a valid tool in the quantification of IPF, since this mainly affects the subpleural space.
Asunto(s)
Algoritmos , Fibrosis Pulmonar Idiopática/patología , Pulmón/patología , Programas Informáticos , Anciano , Estudios de Casos y Controles , Diagnóstico por Computador , Femenino , Humanos , Fibrosis Pulmonar Idiopática/diagnóstico por imagen , Modelos Lineales , Pulmón/diagnóstico por imagen , Mediciones del Volumen Pulmonar , Masculino , Persona de Mediana Edad , Modelos Biológicos , Tomografía Computarizada por Rayos XRESUMEN
Background Relevance of antiangiogenic treatment with bevacizumab in patients with glioblastoma is controversial because progression-free survival benefit did not translate into an overall survival (OS) benefit in randomized phase III trials. Purpose To perform longitudinal characterization of intratumoral angiogenesis and oxygenation by using dynamic susceptibility contrast agent-enhanced (DSC) MRI and evaluate its potential for predicting outcome from administration of bevacizumab. Materials and Methods In this secondary analysis of the prospective randomized phase II/III European Organization for Research and Treatment of Cancer 26101 trial conducted between October 2011 and December 2015 in 596 patients with first recurrence of glioblastoma, the subset of patients with availability of anatomic MRI and DSC MRI at baseline and first follow-up was analyzed. Patients were allocated into those administered bevacizumab (hereafter, the BEV group; either bevacizumab monotherapy or bevacizumab with lomustine) and those not administered bevacizumab (hereafter, the non-BEV group with lomustine monotherapy). Contrast-enhanced tumor volume, noncontrast-enhanced T2 fluid-attenuated inversion recovery (FLAIR) signal abnormality volume, Gaussian-normalized relative cerebral blood volume (nrCBV), Gaussian-normalized relative blood flow (nrCBF), and tumor metabolic rate of oxygen (nTMRO2) was quantified. The predictive ability of these imaging parameters was assessed with multivariable Cox regression and formal interaction testing. Results A total of 254 of 596 patients were evaluated (mean age, 57 years ± 11; 155 men; 161 in the BEV group and 93 in non-BEV group). Progression-free survival was longer in the BEV group (3.7 months; 95% confidence interval [CI]: 3.0, 4.2) compared with the non-BEV group (2.5 months; 95% CI: 1.5, 2.9; P = .01), whereas OS was not different (P = .15). The nrCBV decreased for the BEV group (-16.3%; interquartile range [IQR], -39.5% to 12.0%; P = .01), but not for the non-BEV group (1.2%; IQR, -17.9% to 23.3%; P = .19) between baseline and first follow-up. An identical pattern was observed for both nrCBF and nTMRO2 values. Contrast-enhanced tumor and noncontrast-enhanced T2 FLAIR signal abnormality volumes decreased for the BEV group (-66% [IQR, -83% to -35%] and -33% [IQR, -71% to -5%], respectively; P < .001 for both), whereas they increased for the non-BEV group (30% [IQR, -17% to 98%], P = .001; and 10% [IQR, -13% to 82%], P = .02, respectively) between baseline and first follow-up. None of the assessed MRI parameters were predictive for OS in the BEV group. Conclusion Bevacizumab treatment decreased tumor volumes, angiogenesis, and oxygenation, thereby reflecting its effectiveness for extending progression-free survival; however, these parameters were not predictive of overall survival (OS), which highlighted the challenges of identifying patients that derive an OS benefit from bevacizumab. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Dillon in this issue.
Asunto(s)
Inhibidores de la Angiogénesis/uso terapéutico , Bevacizumab/uso terapéutico , Neoplasias Encefálicas/tratamiento farmacológico , Glioblastoma/tratamiento farmacológico , Imagen por Resonancia Magnética/métodos , Neovascularización Patológica/tratamiento farmacológico , Antineoplásicos Alquilantes/uso terapéutico , Neoplasias Encefálicas/patología , Medios de Contraste , Europa (Continente) , Femenino , Glioblastoma/patología , Humanos , Lomustina/uso terapéutico , Masculino , Persona de Mediana Edad , Recurrencia Local de Neoplasia , Estudios Prospectivos , Análisis de SupervivenciaRESUMEN
Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.
Asunto(s)
Biomarcadores/análisis , Procesamiento de Imagen Asistido por Computador/normas , Programas Informáticos , Calibración , Fluorodesoxiglucosa F18 , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Imagen por Resonancia Magnética , Fantasmas de Imagen , Fenotipo , Tomografía de Emisión de Positrones , Radiofármacos , Reproducibilidad de los Resultados , Sarcoma/diagnóstico por imagen , Tomografía Computarizada por Rayos XRESUMEN
Electroconvulsive therapy (ECT) is a rapid and highly effective treatment option for treatment-resistant major depressive disorder (TRD). The neural mechanisms underlying such beneficial effects are poorly understood. Exploring associations between changes of brain structure and clinical response is crucial for understanding ECT mechanisms of action and relevant for the validation of potential biomarkers that can facilitate the prediction of ECT efficacy. The aim of this explorative study was to identify cortical predictors of clinical response in TRD patients treated with ECT. We longitudinally investigated 12 TRD patients before and after ECT. Twelve matched healthy controls were studied cross sectionally. Demographical, clinical, and structural magnetic resonance imaging data at 3 T and multiple cortical markers derived from surface-based morphometry (SBM) analyses were considered. Multiple regression models were computed to identify predictors of clinical response to ECT, as reflected by Hamilton Depression Rating Scale (HAMD) score changes. Symptom severity differences pre-post-ECT were predicted by models including demographic data, clinical data and SBM of frontal, cingulate, and entorhinal structures. Using all-subsets regression, a model comprising HAMD score at baseline and cortical thickness of the left rostral anterior cingulate gyrus explained most variance in the data (multiple R2 = 0.82). The data suggest that SBM provides powerful measures for identifying biomarkers for ECT response in TRD. Rostral anterior cingulate thickness and HAMD score at baseline showed the greatest predictive power of clinical response, in contrast to cortical complexity, cortical gyrification, or demographical data.
Asunto(s)
Corteza Cerebral/patología , Trastorno Depresivo Mayor , Trastorno Depresivo Resistente al Tratamiento , Terapia Electroconvulsiva , Adulto , Corteza Cerebral/diagnóstico por imagen , Estudios Transversales , Trastorno Depresivo Mayor/patología , Trastorno Depresivo Mayor/fisiopatología , Trastorno Depresivo Mayor/terapia , Trastorno Depresivo Resistente al Tratamiento/diagnóstico por imagen , Trastorno Depresivo Resistente al Tratamiento/patología , Trastorno Depresivo Resistente al Tratamiento/fisiopatología , Femenino , Giro del Cíngulo/diagnóstico por imagen , Giro del Cíngulo/patología , Humanos , Estudios Longitudinales , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , Evaluación de Resultado en la Atención de Salud , PronósticoRESUMEN
BACKGROUND: The Response Assessment in Neuro-Oncology (RANO) criteria and requirements for a uniform protocol have been introduced to standardise assessment of MRI scans in both clinical trials and clinical practice. However, these criteria mainly rely on manual two-dimensional measurements of contrast-enhancing (CE) target lesions and thus restrict both reliability and accurate assessment of tumour burden and treatment response. We aimed to develop a framework relying on artificial neural networks (ANNs) for fully automated quantitative analysis of MRI in neuro-oncology to overcome the inherent limitations of manual assessment of tumour burden. METHODS: In this retrospective study, we compiled a single-institution dataset of MRI data from patients with brain tumours being treated at Heidelberg University Hospital (Heidelberg, Germany; Heidelberg training dataset) to develop and train an ANN for automated identification and volumetric segmentation of CE tumours and non-enhancing T2-signal abnormalities (NEs) on MRI. Independent testing and large-scale application of the ANN for tumour segmentation was done in a single-institution longitudinal testing dataset from the Heidelberg University Hospital and in a multi-institutional longitudinal testing dataset from the prospective randomised phase 2 and 3 European Organisation for Research and Treatment of Cancer (EORTC)-26101 trial (NCT01290939), acquired at 38 institutions across Europe. In both longitudinal datasets, spatial and temporal tumour volume dynamics were automatically quantified to calculate time to progression, which was compared with time to progression determined by RANO, both in terms of reliability and as a surrogate endpoint for predicting overall survival. We integrated this approach for fully automated quantitative analysis of MRI in neuro-oncology within an application-ready software infrastructure and applied it in a simulated clinical environment of patients with brain tumours from the Heidelberg University Hospital (Heidelberg simulation dataset). FINDINGS: For training of the ANN, MRI data were collected from 455 patients with brain tumours (one MRI per patient) being treated at Heidelberg hospital between July 29, 2009, and March 17, 2017 (Heidelberg training dataset). For independent testing of the ANN, an independent longitudinal dataset of 40 patients, with data from 239 MRI scans, was collected at Heidelberg University Hospital in parallel with the training dataset (Heidelberg test dataset), and 2034 MRI scans from 532 patients at 34 institutions collected between Oct 26, 2011, and Dec 3, 2015, in the EORTC-26101 study were of sufficient quality to be included in the EORTC-26101 test dataset. The ANN yielded excellent performance for accurate detection and segmentation of CE tumours and NE volumes in both longitudinal test datasets (median DICE coefficient for CE tumours 0·89 [95% CI 0·86-0·90], and for NEs 0·93 [0·92-0·94] in the Heidelberg test dataset; CE tumours 0·91 [0·90-0·92], NEs 0·93 [0·93-0·94] in the EORTC-26101 test dataset). Time to progression from quantitative ANN-based assessment of tumour response was a significantly better surrogate endpoint than central RANO assessment for predicting overall survival in the EORTC-26101 test dataset (hazard ratios ANN 2·59 [95% CI 1·86-3·60] vs central RANO 2·07 [1·46-2·92]; p<0·0001) and also yielded a 36% margin over RANO (p<0·0001) when comparing reliability values (ie, agreement in the quantitative volumetrically defined time to progression [based on radiologist ground truth vs automated assessment with ANN] of 87% [266 of 306 with sufficient data] compared with 51% [155 of 306] with local vs independent central RANO assessment). In the Heidelberg simulation dataset, which comprised 466 patients with brain tumours, with 595 MRI scans obtained between April 27, and Sept 17, 2018, automated on-demand processing of MRI scans and quantitative tumour response assessment within the simulated clinical environment required 10 min of computation time (average per scan). INTERPRETATION: Overall, we found that ANN enabled objective and automated assessment of tumour response in neuro-oncology at high throughput and could ultimately serve as a blueprint for the application of ANN in radiology to improve clinical decision making. Future research should focus on prospective validation within clinical trials and application for automated high-throughput imaging biomarker discovery and extension to other diseases. FUNDING: Medical Faculty Heidelberg Postdoc-Program, Else Kröner-Fresenius Foundation.
Asunto(s)
Neoplasias Encefálicas/diagnóstico por imagen , Neoplasias Encefálicas/terapia , Diagnóstico por Computador , Interpretación de Imagen Asistida por Computador , Imagen por Resonancia Magnética , Redes Neurales de la Computación , Automatización , Neoplasias Encefálicas/patología , Ensayos Clínicos Fase II como Asunto , Ensayos Clínicos Fase III como Asunto , Bases de Datos Factuales , Progresión de la Enfermedad , Femenino , Alemania , Humanos , Masculino , Estudios Multicéntricos como Asunto , Valor Predictivo de las Pruebas , Ensayos Clínicos Controlados Aleatorios como Asunto , Reproducibilidad de los Resultados , Estudios Retrospectivos , Factores de Tiempo , Resultado del Tratamiento , Carga Tumoral , Flujo de TrabajoRESUMEN
Brain extraction is a critical preprocessing step in the analysis of neuroimaging studies conducted with magnetic resonance imaging (MRI) and influences the accuracy of downstream analyses. The majority of brain extraction algorithms are, however, optimized for processing healthy brains and thus frequently fail in the presence of pathologically altered brain or when applied to heterogeneous MRI datasets. Here we introduce a new, rigorously validated algorithm (termed HD-BET) relying on artificial neural networks that aim to overcome these limitations. We demonstrate that HD-BET outperforms six popular, publicly available brain extraction algorithms in several large-scale neuroimaging datasets, including one from a prospective multicentric trial in neuro-oncology, yielding state-of-the-art performance with median improvements of +1.16 to +2.50 points for the Dice coefficient and -0.66 to -2.51 mm for the Hausdorff distance. Importantly, the HD-BET algorithm, which shows robust performance in the presence of pathology or treatment-induced tissue alterations, is applicable to a broad range of MRI sequence types and is not influenced by variations in MRI hardware and acquisition parameters encountered in both research and clinical practice. For broader accessibility, the HD-BET prediction algorithm is made freely available (www.neuroAI-HD.org) and may become an essential component for robust, automated, high-throughput processing of MRI neuroimaging data.
Asunto(s)
Encéfalo/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Imagen por Resonancia Magnética , Redes Neurales de la Computación , Algoritmos , Humanos , Neuroimagen/métodosRESUMEN
Background Men suspected of having clinically significant prostate cancer (sPC) increasingly undergo prostate MRI. The potential of deep learning to provide diagnostic support for human interpretation requires further evaluation. Purpose To compare the performance of clinical assessment to a deep learning system optimized for segmentation trained with T2-weighted and diffusion MRI in the task of detection and segmentation of lesions suspicious for sPC. Materials and Methods In this retrospective study, T2-weighted and diffusion prostate MRI sequences from consecutive men examined with a single 3.0-T MRI system between 2015 and 2016 were manually segmented. Ground truth was provided by combined targeted and extended systematic MRI-transrectal US fusion biopsy, with sPC defined as International Society of Urological Pathology Gleason grade group greater than or equal to 2. By using split-sample validation, U-Net was internally validated on the training set (80% of the data) through cross validation and subsequently externally validated on the test set (20% of the data). U-Net-derived sPC probability maps were calibrated by matching sextant-based cross-validation performance to clinical performance of Prostate Imaging Reporting and Data System (PI-RADS). Performance of PI-RADS and U-Net were compared by using sensitivities, specificities, predictive values, and Dice coefficient. Results A total of 312 men (median age, 64 years; interquartile range [IQR], 58-71 years) were evaluated. The training set consisted of 250 men (median age, 64 years; IQR, 58-71 years) and the test set of 62 men (median age, 64 years; IQR, 60-69 years). In the test set, PI-RADS cutoffs greater than or equal to 3 versus cutoffs greater than or equal to 4 on a per-patient basis had sensitivity of 96% (25 of 26) versus 88% (23 of 26) at specificity of 22% (eight of 36) versus 50% (18 of 36). U-Net at probability thresholds of greater than or equal to 0.22 versus greater than or equal to 0.33 had sensitivity of 96% (25 of 26) versus 92% (24 of 26) (both P > .99) with specificity of 31% (11 of 36) versus 47% (17 of 36) (both P > .99), not statistically different from PI-RADS. Dice coefficients were 0.89 for prostate and 0.35 for MRI lesion segmentation. In the test set, coincidence of PI-RADS greater than or equal to 4 with U-Net lesions improved the positive predictive value from 48% (28 of 58) to 67% (24 of 36) for U-Net probability thresholds greater than or equal to 0.33 (P = .01), while the negative predictive value remained unchanged (83% [25 of 30] vs 83% [43 of 52]; P > .99). Conclusion U-Net trained with T2-weighted and diffusion MRI achieves similar performance to clinical Prostate Imaging Reporting and Data System assessment. © RSNA, 2019 Online supplemental material is available for this article. See also the editorial by Padhani and Turkbey in this issue.
Asunto(s)
Aprendizaje Profundo , Imagen por Resonancia Magnética , Neoplasias de la Próstata/patología , Anciano , Biopsia , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Neoplasias de la Próstata/diagnóstico por imagen , Estudios Retrospectivos , Sensibilidad y EspecificidadRESUMEN
The individual course of white matter fiber tracts is an important factor for analysis of white matter characteristics in healthy and diseased brains. Diffusion-weighted MRI tractography in combination with region-based or clustering-based selection of streamlines is a unique combination of tools which enables the in-vivo delineation and analysis of anatomically well-known tracts. This, however, currently requires complex, computationally intensive processing pipelines which take a lot of time to set up. TractSeg is a novel convolutional neural network-based approach that directly segments tracts in the field of fiber orientation distribution function (fODF) peaks without using tractography, image registration or parcellation. We demonstrate that the proposed approach is much faster than existing methods while providing unprecedented accuracy, using a population of 105 subjects from the Human Connectome Project. We also show initial evidence that TractSeg is able to generalize to differently acquired data sets for most of the bundles. The code and data are openly available at https://github.com/MIC-DKFZ/TractSeg/ and https://doi.org/10.5281/zenodo.1088277, respectively.
Asunto(s)
Aprendizaje Profundo , Imagen de Difusión Tensora/métodos , Red Nerviosa/diagnóstico por imagen , Neuroimagen/métodos , Sustancia Blanca/diagnóstico por imagen , Adulto , Conectoma , Humanos , Red Nerviosa/anatomía & histología , Sustancia Blanca/anatomía & histologíaRESUMEN
Purpose To compare biparametric contrast-free radiomic machine learning (RML), mean apparent diffusion coefficient (ADC), and radiologist assessment for characterization of prostate lesions detected during prospective MRI interpretation. Materials and Methods This single-institution study included 316 men (mean age ± standard deviation, 64.0 years ± 7.8) with an indication for MRI-transrectal US fusion biopsy between May 2015 and September 2016 (training cohort, 183 patients; test cohort, 133 patients). Lesions identified by prospective clinical readings were manually segmented for mean ADC and radiomics analysis. Global and zone-specific random forest RML and mean ADC models for classification of clinically significant prostate cancer (Gleason grade group ≥ 2) were developed on the training set and the fixed models tested on an independent test set. Clinical readings, mean ADC, and radiomics were compared by using the McNemar test and receiver operating characteristic (ROC) analysis. Results In the test set, radiologist interpretation had a per-lesion sensitivity of 88% (53 of 60) and specificity of 50% (79 of 159). Quantitative measurement of the mean ADC (cut-off 732 mm2/sec) significantly reduced false-positive (FP) lesions from 80 to 60 (specificity 62% [99 of 159]) and false-negative (FN) lesions from seven to six (sensitivity 90% [54 of 60]) (P = .048). Radiologist interpretation had a per-patient sensitivity of 89% (40 of 45) and specificity of 43% (38 of 88). Quantitative measurement of the mean ADC reduced the number of patients with FP lesions from 50 to 43 (specificity 51% [45 of 88]) and the number of patients with FN lesions from five to three (sensitivity 93% [42 of 45]) (P = .496). Comparison of the area under the ROC curve (AUC) for the mean ADC (AUCglobal = 0.84; AUCzone-specific ≤ 0.87) vs the RML (AUCglobal = 0.88, P = .176; AUCzone-specific ≤ 0.89, P ≥ .493) showed no significantly different performance. Conclusion Quantitative measurement of the mean apparent diffusion coefficient (ADC) improved differentiation of benign versus malignant prostate lesions, compared with clinical assessment. Radiomic machine learning had comparable but not better performance than mean ADC assessment. © RSNA, 2018 Online supplemental material is available for this article.
Asunto(s)
Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático , Imagen por Resonancia Magnética/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Anciano , Humanos , Masculino , Persona de Mediana Edad , Próstata/diagnóstico por imagen , Neoplasias de la Próstata/clasificación , Neoplasias de la Próstata/patología , Curva ROC , Estudios RetrospectivosRESUMEN
We present a fiber tractography approach based on a random forest classification and voting process, guiding each step of the streamline progression by directly processing raw diffusion-weighted signal intensities. For comparison to the state-of-the-art, i.e. tractography pipelines that rely on mathematical modeling, we performed a quantitative and qualitative evaluation with multiple phantom and in vivo experiments, including a comparison to the 96 submissions of the ISMRM tractography challenge 2015. The results demonstrate the vast potential of machine learning for fiber tractography.
Asunto(s)
Mapeo Encefálico/métodos , Imagen de Difusión Tensora/métodos , Aprendizaje Automático , HumanosRESUMEN
Autobiographical memory (AM) is part of declarative memory and includes both semantic and episodic aspects. AM deficits are among the major complaints of patients with Alzheimer's disease (AD) even in early or preclinical stages. Previous MRI studies in AD patients have showed that deficits in semantic and episodic AM are associated with hippocampal alterations. However, the question which specific hippocampal subfields and adjacent extrahippocampal structures contribute to deficits of AM in individuals with mild cognitive impairment (MCI) and AD patients has not been investigated so far. Hundred and seven participants (38 AD patients, 38 MCI individuals and 31 healthy controls [HC]) underwent MRI at 3 Tesla. AM was assessed with a semi-structured interview (E-AGI). FreeSurfer 5.3 was used for hippocampal parcellation. Semantic and episodic AM scores were related to the volume of 5 hippocampal subfields and cortical thickness in the parahippocampal and entorhinal cortex. Both semantic and episodic AM deficits were associated with bilateral hippocampal alterations. These associations referred mainly to CA1, CA2-3, presubiculum, and subiculum atrophy. Episodic, but not semantic AM loss was associated with cortical thickness reduction of the bilateral parahippocampal and enthorinal cortex. In MCI individuals, episodic, but not semantic AM deficits were associated with alterations of the CA1, presubiculum and subiculum. Our findings support the crucial role of CA1, presubiculum, and subiculum in episodic memory. The present results implicate that in MCI individuals, semantic and episodic AM deficits are subserved by distinct neuronal systems.
Asunto(s)
Enfermedad de Alzheimer/patología , Disfunción Cognitiva/patología , Hipocampo/patología , Memoria Episódica , Anciano , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Imagen por Resonancia Magnética , Masculino , Persona de Mediana EdadRESUMEN
OBJECTIVE: Neurological soft signs (NSS) are core features of psychiatric disorders with significant neurodevelopmental origin. However, it is unclear whether NSS correlates are associated with neuropathological processes underlying the disease or if they are confounded by medication. Given that NSS are also present in healthy persons (HP), investigating HP could reveal NSS correlates, which are not biased by disease-specific processes or drug treatment. Therefore, we used a combination of diffusion MRI analysis tools to provide a framework of specific white matter (WM) microstructure variations underlying NSS in HP. METHOD: NSS of 59 HP were examined on the Heidelberg Scale and related to diffusion associated metrics. Using tract-based spatial statistics (TBSS), we studied WM variations in fractional anisotropy (FA) as well as radial (RD), axial (AD), and mean diffusivity (MD). Using graph analytics (clustering coefficient-CC, local betweenness centrality -BC), we then explored DTI-derived structural network variations in regions identified by previous MRI studies on NSS. RESULTS: NSS scores were negatively associated with RD, AD and MD in corpus callosum, brainstem and cerebellum (P < 0.05, corr.). NSS scores were negatively associated with CC and BC of the pallidum, the superior parietal gyrus, the precentral sulcus, the insula, and the cingulate gyrus (P < 0.05, uncorr.). CONCLUSION: The present study supports the notion that WM microstructure variations in subcortical and cortical sensorimotor regions contribute to NSS expression in young HP. Hum Brain Mapp 38:3552-3565, 2017. © 2017 Wiley Periodicals, Inc.