Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 152
Filtrar
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38347140

RESUMEN

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Asunto(s)
Inteligencia Artificial
2.
Nat Methods ; 20(7): 1010-1020, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37202537

RESUMEN

The Cell Tracking Challenge is an ongoing benchmarking initiative that has become a reference in cell segmentation and tracking algorithm development. Here, we present a significant number of improvements introduced in the challenge since our 2017 report. These include the creation of a new segmentation-only benchmark, the enrichment of the dataset repository with new datasets that increase its diversity and complexity, and the creation of a silver standard reference corpus based on the most competitive results, which will be of particular interest for data-hungry deep learning-based strategies. Furthermore, we present the up-to-date cell segmentation and tracking leaderboards, an in-depth analysis of the relationship between the performance of the state-of-the-art methods and the properties of the datasets and annotations, and two novel, insightful studies about the generalizability and the reusability of top-performing methods. These studies provide critical practical conclusions for both developers and users of traditional and machine learning-based cell segmentation and tracking algorithms.


Asunto(s)
Benchmarking , Rastreo Celular , Rastreo Celular/métodos , Aprendizaje Automático , Algoritmos
3.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38876123

RESUMEN

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Asunto(s)
Inteligencia Artificial , Imagen por Resonancia Magnética , Neoplasias de la Próstata , Radiólogos , Humanos , Masculino , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Anciano , Estudios Retrospectivos , Persona de Mediana Edad , Clasificación del Tumor , Países Bajos , Curva ROC
4.
Lancet Oncol ; 25(3): 400-410, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38423052

RESUMEN

BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.


Asunto(s)
Aprendizaje Profundo , Glioblastoma , Humanos , Inteligencia Artificial , Biomarcadores , Estudios de Cohortes , Glioblastoma/diagnóstico por imagen , Imagen por Resonancia Magnética , Estudios Retrospectivos
5.
Nat Methods ; 18(2): 203-211, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33288961

RESUMEN

Biomedical imaging is a driver of scientific discovery and a core component of medical care and is being stimulated by the field of deep learning. While semantic segmentation algorithms enable image analysis and quantification in many applications, the design of respective specialized solutions is non-trivial and highly dependent on dataset properties and hardware conditions. We developed nnU-Net, a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task. The key design choices in this process are modeled as a set of fixed parameters, interdependent rules and empirical decisions. Without manual intervention, nnU-Net surpasses most existing approaches, including highly specialized solutions on 23 public datasets used in international biomedical segmentation competitions. We make nnU-Net publicly available as an out-of-the-box tool, rendering state-of-the-art segmentation accessible to a broad audience by requiring neither expert knowledge nor computing resources beyond standard network training.


Asunto(s)
Aprendizaje Profundo , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Redes Neurales de la Computación
6.
J Magn Reson Imaging ; 59(4): 1409-1422, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37504495

RESUMEN

BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética/métodos , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos , Poliésteres
7.
J Magn Reson Imaging ; 2024 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-38733369

RESUMEN

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

8.
Eur Radiol ; 34(7): 4379-4392, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38150075

RESUMEN

OBJECTIVES: To quantify regional manifestations related to COPD as anomalies from a modeled distribution of normal-appearing lung on chest CT using a deep learning (DL) approach, and to assess its potential to predict disease severity. MATERIALS AND METHODS: Paired inspiratory/expiratory CT and clinical data from COPDGene and COSYCONET cohort studies were included. COPDGene data served as training/validation/test data sets (N = 3144/786/1310) and COSYCONET as external test set (N = 446). To differentiate low-risk (healthy/minimal disease, [GOLD 0]) from COPD patients (GOLD 1-4), the self-supervised DL model learned semantic information from 50 × 50 × 50 voxel samples from segmented intact lungs. An anomaly detection approach was trained to quantify lung abnormalities related to COPD, as regional deviations. Four supervised DL models were run for comparison. The clinical and radiological predictive power of the proposed anomaly score was assessed using linear mixed effects models (LMM). RESULTS: The proposed approach achieved an area under the curve of 84.3 ± 0.3 (p < 0.001) for COPDGene and 76.3 ± 0.6 (p < 0.001) for COSYCONET, outperforming supervised models even when including only inspiratory CT. Anomaly scores significantly improved fitting of LMM for predicting lung function, health status, and quantitative CT features (emphysema/air trapping; p < 0.001). Higher anomaly scores were significantly associated with exacerbations for both cohorts (p < 0.001) and greater dyspnea scores for COPDGene (p < 0.001). CONCLUSION: Quantifying heterogeneous COPD manifestations as anomaly offers advantages over supervised methods and was found to be predictive for lung function impairment and morphology deterioration. CLINICAL RELEVANCE STATEMENT: Using deep learning, lung manifestations of COPD can be identified as deviations from normal-appearing chest CT and attributed an anomaly score which is consistent with decreased pulmonary function, emphysema, and air trapping. KEY POINTS: • A self-supervised DL anomaly detection method discriminated low-risk individuals and COPD subjects, outperforming classic DL methods on two datasets (COPDGene AUC = 84.3%, COSYCONET AUC = 76.3%). • Our contrastive task exhibits robust performance even without the inclusion of expiratory images, while voxel-based methods demonstrate significant performance enhancement when incorporating expiratory images, in the COPDGene dataset. • Anomaly scores improved the fitting of linear mixed effects models in predicting clinical parameters and imaging alterations (p < 0.001) and were directly associated with clinical outcomes (p < 0.001).


Asunto(s)
Aprendizaje Profundo , Enfermedad Pulmonar Obstructiva Crónica , Índice de Severidad de la Enfermedad , Tomografía Computarizada por Rayos X , Humanos , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico por imagen , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Masculino , Femenino , Tomografía Computarizada por Rayos X/métodos , Persona de Mediana Edad , Anciano , Valor Predictivo de las Pruebas , Pulmón/diagnóstico por imagen , Estudios de Cohortes
9.
Eur Radiol ; 33(11): 7463-7476, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37507610

RESUMEN

OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.


Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Masculino , Humanos , Imagen por Resonancia Magnética , Próstata/diagnóstico por imagen , Próstata/patología , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos
10.
Acta Neurochir (Wien) ; 165(4): 1041-1051, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36862216

RESUMEN

PURPOSE: Fiber tracking (FT) is used in neurosurgical planning for the resection of lesions in proximity to fiber pathways, as it contributes to a substantial amelioration of postoperative neurological impairments. Currently, diffusion-tensor imaging (DTI)-based FT is the most frequently used technique; however, sophisticated techniques such as Q-ball (QBI) for high-resolution FT (HRFT) have suggested favorable results. Little is known about the reproducibility of both techniques in the clinical setting. Therefore, this study aimed to examine the intra- and interrater agreement for the depiction of white matter pathways such as the corticospinal tract (CST) and the optic radiation (OR). METHODS: Nineteen patients with eloquent lesions in the proximity of the OR or CST were prospectively enrolled. Two different raters independently reconstructed the fiber bundles by applying probabilistic DTI- and QBI-FT. Interrater agreement was evaluated from the comparison between results obtained by the two raters on the same data set acquired in two independent iterations at different timepoints using the Dice Similarity Coefficient (DSC) and the Jaccard Coefficient (JC). Likewise, intrarater agreement was determined for each rater comparing individual results. RESULTS: DSC values showed substantial intrarater agreement based on DTI-FT (rater 1: mean 0.77 (0.68-0.85); rater 2: mean 0.75 (0.64-0.81); p = 0.673); while an excellent agreement was observed after the deployment of QBI-based FT (rater 1: mean 0.86 (0.78-0.98); rater 2: mean 0.80 (0.72-0.91); p = 0.693). In contrast, fair agreement was observed between both measures for the repeatability of the OR of each rater based on DTI-FT (rater 1: mean 0.36 (0.26-0.77); rater 2: mean 0.40 (0.27-0.79), p = 0.546). A substantial agreement between the measures was noted by applying QBI-FT (rater 1: mean 0.67 (0.44-0.78); rater 2: mean 0.62 (0.32-0.70), 0.665). The interrater agreement was moderate for the reproducibility of the CST and OR for both DSC and JC based on DTI-FT (DSC and JC ≥ 0.40); while a substantial interrater agreement was noted for DSC after applying QBI-based FT for the delineation of both fiber tracts (DSC > 0.6). CONCLUSIONS: Our findings suggest that QBI-based FT might be a more robust tool for the visualization of the OR and CST adjacent to intracerebral lesions compared with the common standard DTI-FT. For neurosurgical planning during the daily workflow, QBI appears to be feasible and less operator-dependent.


Asunto(s)
Tractos Piramidales , Sustancia Blanca , Humanos , Tractos Piramidales/diagnóstico por imagen , Tractos Piramidales/patología , Reproducibilidad de los Resultados , Imagen de Difusión Tensora/métodos , Sustancia Blanca/patología
11.
Medicina (Kaunas) ; 58(9)2022 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-36143877

RESUMEN

Background and Objectives: In the literature, spinal navigation and robot-assisted surgery improved screw placement accuracy, but the majority of studies only qualitatively report on screw positioning within the vertebra. We sought to evaluate screw placement accuracy in relation to a preoperative trajectory plan by three-dimensional quantification to elucidate technical benefits of navigation for lumbar pedicle screws. Materials and Methods: In 27 CT-navigated instrumentations for degenerative disease, a dedicated intraoperative 3D-trajectory plan was created for all screws. Final screw positions were defined on postoperative CT. Trajectory plans and final screw positions were co-registered and quantitatively compared computing minimal absolute differences (MAD) of screw head and tip points (mm) and screw axis (degree) in 3D-space, respectively. Differences were evaluated with consideration of the navigation target registration error. Clinical acceptability of screws was evaluated using the Gertzbein−Robbins (GR) classification. Results: Data included 140 screws covering levels L1-S1. While screw placement was clinically acceptable in all cases (GR grade A and B in 112 (80%) and 28 (20%) cases, respectively), implanted screws showed considerable deviation compared to the trajectory plan: Mean axis deviation was 6.3° ± 3.6°, screw head and tip points showed mean MAD of 5.2 ± 2.4 mm and 5.5 ± 2.7 mm, respectively. Deviations significantly exceeded the mean navigation registration error of 0.87 ± 0.22 mm (p < 0.001). Conclusions: Screw placement was clinically acceptable in all screws after navigated placement but nevertheless, considerable deviation in implanted screws was noted compared to the initial trajectory plan. Our data provides a 3D-quantitative benchmark for screw accuracy achievable by CT-navigation in routine spine surgery and suggests a framework for objective comparison of screw outcome after navigated or robot-assisted procedures. Factors contributing to screw deviations should be considered to assure optimal surgical results when applying navigation for spinal instrumentation.


Asunto(s)
Tornillos Pediculares , Fusión Vertebral , Humanos , Vértebras Lumbares/cirugía , Estudios Retrospectivos , Fusión Vertebral/métodos , Columna Vertebral/cirugía , Tomografía Computarizada por Rayos X/métodos
12.
Neuroimage ; 238: 118216, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34052465

RESUMEN

Accurate detection and quantification of unruptured intracranial aneurysms (UIAs) is important for rupture risk assessment and to allow an informed treatment decision to be made. Currently, 2D manual measures used to assess UIAs on Time-of-Flight magnetic resonance angiographies (TOF-MRAs) lack 3D information and there is substantial inter-observer variability for both aneurysm detection and assessment of aneurysm size and growth. 3D measures could be helpful to improve aneurysm detection and quantification but are time-consuming and would therefore benefit from a reliable automatic UIA detection and segmentation method. The Aneurysm Detection and segMentation (ADAM) challenge was organised in which methods for automatic UIA detection and segmentation were developed and submitted to be evaluated on a diverse clinical TOF-MRA dataset. A training set (113 cases with a total of 129 UIAs) was released, each case including a TOF-MRA, a structural MR image (T1, T2 or FLAIR), annotation of any present UIA(s) and the centre voxel of the UIA(s). A test set of 141 cases (with 153 UIAs) was used for evaluation. Two tasks were proposed: (1) detection and (2) segmentation of UIAs on TOF-MRAs. Teams developed and submitted containerised methods to be evaluated on the test set. Task 1 was evaluated using metrics of sensitivity and false positive count. Task 2 was evaluated using dice similarity coefficient, modified hausdorff distance (95th percentile) and volumetric similarity. For each task, a ranking was made based on the average of the metrics. In total, eleven teams participated in task 1 and nine of those teams participated in task 2. Task 1 was won by a method specifically designed for the detection task (i.e. not participating in task 2). Based on segmentation metrics, the top two methods for task 2 performed statistically significantly better than all other methods. The detection performance of the top-ranking methods was comparable to visual inspection for larger aneurysms. Segmentation performance of the top ranking method, after selection of true UIAs, was similar to interobserver performance. The ADAM challenge remains open for future submissions and improved submissions, with a live leaderboard to provide benchmarking for method developments at https://adam.isi.uu.nl/.


Asunto(s)
Angiografía Cerebral/métodos , Aneurisma Intracraneal/diagnóstico por imagen , Angiografía por Resonancia Magnética/métodos , Conjuntos de Datos como Asunto , Evaluación Educacional , Humanos , Imagen por Resonancia Magnética , Distribución Aleatoria , Medición de Riesgo
13.
Eur Radiol ; 31(1): 302-313, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-32767102

RESUMEN

OBJECTIVES: To simulate clinical deployment, evaluate performance, and establish quality assurance of a deep learning algorithm (U-Net) for detection, localization, and segmentation of clinically significant prostate cancer (sPC), ISUP grade group ≥ 2, using bi-parametric MRI. METHODS: In 2017, 284 consecutive men in active surveillance, biopsy-naïve or pre-biopsied, received targeted and extended systematic MRI/transrectal US-fusion biopsy, after examination on a single MRI scanner (3 T). A prospective adjustment scheme was evaluated comparing the performance of the Prostate Imaging Reporting and Data System (PI-RADS) and U-Net using sensitivity, specificity, predictive values, and the Dice coefficient. RESULTS: In the 259 eligible men (median 64 [IQR 61-72] years), PI-RADS had a sensitivity of 98% [106/108]/84% [91/108] with a specificity of 17% [25/151]/58% [88/151], for thresholds at ≥ 3/≥ 4 respectively. U-Net using dynamic threshold adjustment had a sensitivity of 99% [107/108]/83% [90/108] (p > 0.99/> 0.99) with a specificity of 24% [36/151]/55% [83/151] (p > 0.99/> 0.99) for probability thresholds d3 and d4 emulating PI-RADS ≥ 3 and ≥ 4 decisions respectively, not statistically different from PI-RADS. Co-occurrence of a radiological PI-RADS ≥ 4 examination and U-Net ≥ d3 assessment significantly improved the positive predictive value from 59 to 63% (p = 0.03), on a per-patient basis. CONCLUSIONS: U-Net has similar performance to PI-RADS in simulated continued clinical use. Regular quality assurance should be implemented to ensure desired performance. KEY POINTS: • U-Net maintained similar diagnostic performance compared to radiological assessment of PI-RADS ≥ 4 when applied in a simulated clinical deployment. • Application of our proposed prospective dynamic calibration method successfully adjusted U-Net performance within acceptable limits of the PI-RADS reference over time, while not being limited to PI-RADS as a reference. • Simultaneous detection by U-Net and radiological assessment significantly improved the positive predictive value on a per-patient and per-lesion basis, while the negative predictive value remained unchanged.


Asunto(s)
Aprendizaje Profundo , Neoplasias de la Próstata , Humanos , Biopsia Guiada por Imagen , Imagen por Resonancia Magnética , Masculino , Estudios Prospectivos , Neoplasias de la Próstata/diagnóstico por imagen
14.
Respiration ; 100(7): 580-587, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33857945

RESUMEN

OBJECTIVE: Evaluation of software tools for segmentation, quantification, and characterization of fibrotic pulmonary parenchyma changes will strengthen the role of CT as biomarkers of disease extent, evolution, and response to therapy in idiopathic pulmonary fibrosis (IPF) patients. METHODS: 418 nonenhanced thin-section MDCTs of 127 IPF patients and 78 MDCTs of 78 healthy individuals were analyzed through 3 fully automated, completely different software tools: YACTA, LUFIT, and IMBIO. The agreement between YACTA and LUFIT on segmented lung volume and 80th (reflecting fibrosis) and 40th (reflecting ground-glass opacity) percentile of the lung density histogram was analyzed using Bland-Altman plots. The fibrosis and ground-glass opacity segmented by IMBIO (lung texture analysis software tool) were included in specific regression analyses. RESULTS: In the IPF-group, LUFIT outperformed YACTA by segmenting more lung volume (mean difference 242 mL, 95% limits of agreement -54 to 539 mL), as well as quantifying higher 80th (76 HU, -6 to 158 HU) and 40th percentiles (9 HU, -73 to 90 HU). No relevant differences were revealed in the control group. The 80th/40th percentile as quantified by LUFIT correlated positively with the percentage of fibrosis/ground-glass opacity calculated by IMBIO (r = 0.78/r = 0.92). CONCLUSIONS: In terms of segmentation of pulmonary fibrosis, LUFIT as a shape model-based segmentation software tool is superior to the threshold-based YACTA, tool, since the density of (severe) fibrosis is similar to that of the surrounding soft tissues. Therefore, shape modeling as used in LUFIT may serve as a valid tool in the quantification of IPF, since this mainly affects the subpleural space.


Asunto(s)
Algoritmos , Fibrosis Pulmonar Idiopática/patología , Pulmón/patología , Programas Informáticos , Anciano , Estudios de Casos y Controles , Diagnóstico por Computador , Femenino , Humanos , Fibrosis Pulmonar Idiopática/diagnóstico por imagen , Modelos Lineales , Pulmón/diagnóstico por imagen , Mediciones del Volumen Pulmonar , Masculino , Persona de Mediana Edad , Modelos Biológicos , Tomografía Computarizada por Rayos X
15.
Radiology ; 297(1): 164-175, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32720870

RESUMEN

Background Relevance of antiangiogenic treatment with bevacizumab in patients with glioblastoma is controversial because progression-free survival benefit did not translate into an overall survival (OS) benefit in randomized phase III trials. Purpose To perform longitudinal characterization of intratumoral angiogenesis and oxygenation by using dynamic susceptibility contrast agent-enhanced (DSC) MRI and evaluate its potential for predicting outcome from administration of bevacizumab. Materials and Methods In this secondary analysis of the prospective randomized phase II/III European Organization for Research and Treatment of Cancer 26101 trial conducted between October 2011 and December 2015 in 596 patients with first recurrence of glioblastoma, the subset of patients with availability of anatomic MRI and DSC MRI at baseline and first follow-up was analyzed. Patients were allocated into those administered bevacizumab (hereafter, the BEV group; either bevacizumab monotherapy or bevacizumab with lomustine) and those not administered bevacizumab (hereafter, the non-BEV group with lomustine monotherapy). Contrast-enhanced tumor volume, noncontrast-enhanced T2 fluid-attenuated inversion recovery (FLAIR) signal abnormality volume, Gaussian-normalized relative cerebral blood volume (nrCBV), Gaussian-normalized relative blood flow (nrCBF), and tumor metabolic rate of oxygen (nTMRO2) was quantified. The predictive ability of these imaging parameters was assessed with multivariable Cox regression and formal interaction testing. Results A total of 254 of 596 patients were evaluated (mean age, 57 years ± 11; 155 men; 161 in the BEV group and 93 in non-BEV group). Progression-free survival was longer in the BEV group (3.7 months; 95% confidence interval [CI]: 3.0, 4.2) compared with the non-BEV group (2.5 months; 95% CI: 1.5, 2.9; P = .01), whereas OS was not different (P = .15). The nrCBV decreased for the BEV group (-16.3%; interquartile range [IQR], -39.5% to 12.0%; P = .01), but not for the non-BEV group (1.2%; IQR, -17.9% to 23.3%; P = .19) between baseline and first follow-up. An identical pattern was observed for both nrCBF and nTMRO2 values. Contrast-enhanced tumor and noncontrast-enhanced T2 FLAIR signal abnormality volumes decreased for the BEV group (-66% [IQR, -83% to -35%] and -33% [IQR, -71% to -5%], respectively; P < .001 for both), whereas they increased for the non-BEV group (30% [IQR, -17% to 98%], P = .001; and 10% [IQR, -13% to 82%], P = .02, respectively) between baseline and first follow-up. None of the assessed MRI parameters were predictive for OS in the BEV group. Conclusion Bevacizumab treatment decreased tumor volumes, angiogenesis, and oxygenation, thereby reflecting its effectiveness for extending progression-free survival; however, these parameters were not predictive of overall survival (OS), which highlighted the challenges of identifying patients that derive an OS benefit from bevacizumab. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Dillon in this issue.


Asunto(s)
Inhibidores de la Angiogénesis/uso terapéutico , Bevacizumab/uso terapéutico , Neoplasias Encefálicas/tratamiento farmacológico , Glioblastoma/tratamiento farmacológico , Imagen por Resonancia Magnética/métodos , Neovascularización Patológica/tratamiento farmacológico , Antineoplásicos Alquilantes/uso terapéutico , Neoplasias Encefálicas/patología , Medios de Contraste , Europa (Continente) , Femenino , Glioblastoma/patología , Humanos , Lomustina/uso terapéutico , Masculino , Persona de Mediana Edad , Recurrencia Local de Neoplasia , Estudios Prospectivos , Análisis de Supervivencia
16.
Radiology ; 295(2): 328-338, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32154773

RESUMEN

Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.


Asunto(s)
Biomarcadores/análisis , Procesamiento de Imagen Asistido por Computador/normas , Programas Informáticos , Calibración , Fluorodesoxiglucosa F18 , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Imagen por Resonancia Magnética , Fantasmas de Imagen , Fenotipo , Tomografía de Emisión de Positrones , Radiofármacos , Reproducibilidad de los Resultados , Sarcoma/diagnóstico por imagen , Tomografía Computarizada por Rayos X
17.
Oncology ; 98(6): 363-369, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-30439700

RESUMEN

Information technology (IT) can enhance or change many scenarios in cancer research for the better. In this paper, we introduce several examples, starting with clinical data reuse and collaboration including data sharing in research networks. Key challenges are semantic interoperability and data access (including data privacy). We deal with gathering and analyzing genomic information, where cloud computing, uncertainties and reproducibility challenge researchers. Also, new sources for additional phenotypical data are shown in patient-reported outcome and machine learning in imaging. Last, we focus on therapy assistance, introducing tools used in molecular tumor boards and techniques for computer-assisted surgery. We discuss the need for metadata to aggregate and analyze data sets reliably. We conclude with an outlook towards a learning health care system in oncology, which connects bench and bedside by employing modern IT solutions.


Asunto(s)
Oncología Médica/métodos , Neoplasias/diagnóstico , Neoplasias/terapia , Investigación Biomédica/métodos , Humanos , Tecnología de la Información , Aprendizaje Automático , Reproducibilidad de los Resultados
18.
J Magn Reson Imaging ; 51(1): 234-249, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31179595

RESUMEN

BACKGROUND: Fiber tracking with diffusion-weighted MRI has become an essential tool for estimating in vivo brain white matter architecture. Fiber tracking results are sensitive to the choice of processing method and tracking criteria. PURPOSE: To assess the variability for an algorithm in group studies reproducibility is of critical context. However, reproducibility does not assess the validity of the brain connections. Phantom studies provide concrete quantitative comparisons of methods relative to absolute ground truths, yet do no capture variabilities because of in vivo physiological factors. The ISMRM 2017 TraCED challenge was created to fulfill the gap. STUDY TYPE: A systematic review of algorithms and tract reproducibility studies. SUBJECTS: Single healthy volunteers. FIELD STRENGTH/SEQUENCE: 3.0T, two different scanners by the same manufacturer. The multishell acquisition included b-values of 1000, 2000, and 3000 s/mm2 with 20, 45, and 64 diffusion gradient directions per shell, respectively. ASSESSMENT: Nine international groups submitted 46 tractography algorithm entries each consisting 16 tracts per scan. The algorithms were assessed using intraclass correlation (ICC) and the Dice similarity measure. STATISTICAL TESTS: Containment analysis was performed to assess if the submitted algorithms had containment within tracts of larger volume submissions. This also serves the purpose to detect if spurious submissions had been made. RESULTS: The top five submissions had high ICC and Dice >0.88. Reproducibility was high within the top five submissions when assessed across sessions or across scanners: 0.87-0.97. Containment analysis shows that the top five submissions are contained within larger volume submissions. From the total of 16 tracts as an outcome relatively the number of tracts with high, moderate, and low reproducibility were 8, 4, and 4. DATA CONCLUSION: The different methods clearly result in fundamentally different tract structures at the more conservative specificity choices. Data and challenge infrastructure remain available for continued analysis and provide a platform for comparison. LEVEL OF EVIDENCE: 5 Technical Efficacy Stage: 1 J. Magn. Reson. Imaging 2020;51:234-249.


Asunto(s)
Encéfalo/anatomía & histología , Imagen de Difusión Tensora/métodos , Imagen de Difusión por Resonancia Magnética , Humanos , Valores de Referencia , Reproducibilidad de los Resultados
19.
Eur Radiol ; 30(4): 2356-2364, 2020 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-31900702

RESUMEN

OBJECTIVES: Patients with multiple sclerosis (MS) regularly undergo MRI for assessment of disease burden. However, interpretation may be time consuming and prone to intra- and interobserver variability. Here, we evaluate the potential of artificial neural networks (ANN) for automated volumetric assessment of MS disease burden and activity on MRI. METHODS: A single-institutional dataset with 334 MS patients (334 MRI exams) was used to develop and train an ANN for automated identification and volumetric segmentation of T2/FLAIR-hyperintense and contrast-enhancing (CE) lesions. Independent testing was performed in a single-institutional longitudinal dataset with 82 patients (266 MRI exams). We evaluated lesion detection performance (F1 scores), lesion segmentation agreement (DICE coefficients), and lesion volume agreement (concordance correlation coefficients [CCC]). Independent evaluation was performed on the public ISBI-2015 challenge dataset. RESULTS: The F1 score was maximized in the training set at a detection threshold of 7 mm3 for T2/FLAIR lesions and 14 mm3 for CE lesions. In the training set, mean F1 scores were 0.867 for T2/FLAIR lesions and 0.636 for CE lesions, as compared to 0.878 for T2/FLAIR lesions and 0.715 for CE lesions in the test set. Using these thresholds, the ANN yielded mean DICE coefficients of 0.834 and 0.878 for segmentation of T2/FLAIR and CE lesions in the training set (fivefold cross-validation). Corresponding DICE coefficients in the test set were 0.846 for T2/FLAIR lesions and 0.908 for CE lesions, and the CCC was ≥ 0.960 in each dataset. CONCLUSIONS: Our results highlight the capability of ANN for quantitative state-of-the-art assessment of volumetric lesion load on MRI and potentially enable a more accurate assessment of disease burden in patients with MS. KEY POINTS: • Artificial neural networks (ANN) can accurately detect and segment both T2/FLAIR and contrast-enhancing MS lesions in MRI data. • Performance of the ANN was consistent in a clinically derived dataset, with patients presenting all possible disease stages in MRI scans acquired from standard clinical routine rather than with high-quality research sequences. • Computer-aided evaluation of MS with ANN could streamline both clinical and research procedures in the volumetric assessment of MS disease burden as well as in lesion detection.


Asunto(s)
Encéfalo/patología , Imagen por Resonancia Magnética/métodos , Esclerosis Múltiple/diagnóstico , Redes Neurales de la Computación , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Reproducibilidad de los Resultados
20.
Eur Arch Psychiatry Clin Neurosci ; 270(2): 253-261, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31278421

RESUMEN

Electroconvulsive therapy (ECT) is a rapid and highly effective treatment option for treatment-resistant major depressive disorder (TRD). The neural mechanisms underlying such beneficial effects are poorly understood. Exploring associations between changes of brain structure and clinical response is crucial for understanding ECT mechanisms of action and relevant for the validation of potential biomarkers that can facilitate the prediction of ECT efficacy. The aim of this explorative study was to identify cortical predictors of clinical response in TRD patients treated with ECT. We longitudinally investigated 12 TRD patients before and after ECT. Twelve matched healthy controls were studied cross sectionally. Demographical, clinical, and structural magnetic resonance imaging data at 3 T and multiple cortical markers derived from surface-based morphometry (SBM) analyses were considered. Multiple regression models were computed to identify predictors of clinical response to ECT, as reflected by Hamilton Depression Rating Scale (HAMD) score changes. Symptom severity differences pre-post-ECT were predicted by models including demographic data, clinical data and SBM of frontal, cingulate, and entorhinal structures. Using all-subsets regression, a model comprising HAMD score at baseline and cortical thickness of the left rostral anterior cingulate gyrus explained most variance in the data (multiple R2 = 0.82). The data suggest that SBM provides powerful measures for identifying biomarkers for ECT response in TRD. Rostral anterior cingulate thickness and HAMD score at baseline showed the greatest predictive power of clinical response, in contrast to cortical complexity, cortical gyrification, or demographical data.


Asunto(s)
Corteza Cerebral/patología , Trastorno Depresivo Mayor , Trastorno Depresivo Resistente al Tratamiento , Terapia Electroconvulsiva , Adulto , Corteza Cerebral/diagnóstico por imagen , Estudios Transversales , Trastorno Depresivo Mayor/patología , Trastorno Depresivo Mayor/fisiopatología , Trastorno Depresivo Mayor/terapia , Trastorno Depresivo Resistente al Tratamiento/diagnóstico por imagen , Trastorno Depresivo Resistente al Tratamiento/patología , Trastorno Depresivo Resistente al Tratamiento/fisiopatología , Femenino , Giro del Cíngulo/diagnóstico por imagen , Giro del Cíngulo/patología , Humanos , Estudios Longitudinales , Imagen por Resonancia Magnética , Masculino , Persona de Mediana Edad , Evaluación de Resultado en la Atención de Salud , Pronóstico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA