Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38876123

RESUMO

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Assuntos
Inteligência Artificial , Imageamento por Ressonância Magnética , Neoplasias da Próstata , Radiologistas , Humanos , Masculino , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Idoso , Estudos Retrospectivos , Pessoa de Meia-Idade , Gradação de Tumores , Países Baixos , Curva ROC
2.
Med Image Anal ; 95: 103206, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38776844

RESUMO

The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical Schools' Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.


Assuntos
Algoritmos , Densidade da Mama , Neoplasias da Mama , Mamografia , Humanos , Feminino , Mamografia/métodos , Neoplasias da Mama/diagnóstico por imagem , Aprendizado de Máquina
3.
J Magn Reson Imaging ; 2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38733369

RESUMO

BACKGROUND: Radiomics models trained on data from one center typically show a decline of performance when applied to data from external centers, hindering their introduction into large-scale clinical practice. Current expert recommendations suggest to use only reproducible radiomics features isolated by multiscanner test-retest experiments, which might help to overcome the problem of limited generalizability to external data. PURPOSE: To evaluate the influence of using only a subset of robust radiomics features, defined in a prior in vivo multi-MRI-scanner test-retest-study, on the performance and generalizability of radiomics models. STUDY TYPE: Retrospective. POPULATION: Patients with monoclonal plasma cell disorders. Training set (117 MRIs from center 1); internal test set (42 MRIs from center 1); external test set (143 MRIs from center 2-8). FIELD STRENGTH/SEQUENCE: 1.5T and 3.0T; T1-weighted turbo spin echo. ASSESSMENT: The task for the radiomics models was to predict plasma cell infiltration, determined by bone marrow biopsy, noninvasively from MRI. Radiomics machine learning models, including linear regressor, support vector regressor (SVR), and random forest regressor (RFR), were trained on data from center 1, using either all radiomics features, or using only reproducible radiomics features. Models were tested on an internal (center 1) and a multicentric external data set (center 2-8). STATISTICAL TESTS: Pearson correlation coefficient r and mean absolute error (MAE) between predicted and actual plasma cell infiltration. Fisher's z-transformation, Wilcoxon signed-rank test, Wilcoxon rank-sum test; significance level P < 0.05. RESULTS: When using only reproducible features compared with all features, the performance of the SVR on the external test set significantly improved (r = 0.43 vs. r = 0.18 and MAE = 22.6 vs. MAE = 28.2). For the RFR, the performance on the external test set deteriorated when using only reproducible instead of all radiomics features (r = 0.33 vs. r = 0.44, P = 0.29 and MAE = 21.9 vs. MAE = 20.5, P = 0.10). CONCLUSION: Using only reproducible radiomics features improves the external performance of some, but not all machine learning models, and did not automatically lead to an improvement of the external performance of the overall best radiomics model. TECHNICAL EFFICACY: Stage 2.

4.
Front Med (Lausanne) ; 11: 1360706, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38495118

RESUMO

Background: Chronic obstructive pulmonary disease (COPD) poses a substantial global health burden, demanding advanced diagnostic tools for early detection and accurate phenotyping. In this line, this study seeks to enhance COPD characterization on chest computed tomography (CT) by comparing the spatial and quantitative relationships between traditional parametric response mapping (PRM) and a novel self-supervised anomaly detection approach, and to unveil potential additional insights into the dynamic transitional stages of COPD. Methods: Non-contrast inspiratory and expiratory CT of 1,310 never-smoker and GOLD 0 individuals and COPD patients (GOLD 1-4) from the COPDGene dataset were retrospectively evaluated. A novel self-supervised anomaly detection approach was applied to quantify lung abnormalities associated with COPD, as regional deviations. These regional anomaly scores were qualitatively and quantitatively compared, per GOLD class, to PRM volumes (emphysema: PRMEmph, functional small-airway disease: PRMfSAD) and to a Principal Component Analysis (PCA) and Clustering, applied on the self-supervised latent space. Its relationships to pulmonary function tests (PFTs) were also evaluated. Results: Initial t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization of the self-supervised latent space highlighted distinct spatial patterns, revealing clear separations between regions with and without emphysema and air trapping. Four stable clusters were identified among this latent space by the PCA and Cluster Analysis. As the GOLD stage increased, PRMEmph, PRMfSAD, anomaly score, and Cluster 3 volumes exhibited escalating trends, contrasting with a decline in Cluster 2. The patient-wise anomaly scores significantly differed across GOLD stages (p < 0.01), except for never-smokers and GOLD 0 patients. In contrast, PRMEmph, PRMfSAD, and cluster classes showed fewer significant differences. Pearson correlation coefficients revealed moderate anomaly score correlations to PFTs (0.41-0.68), except for the functional residual capacity and smoking duration. The anomaly score was correlated with PRMEmph (r = 0.66, p < 0.01) and PRMfSAD (r = 0.61, p < 0.01). Anomaly scores significantly improved fitting of PRM-adjusted multivariate models for predicting clinical parameters (p < 0.001). Bland-Altman plots revealed that volume agreement between PRM-derived volumes and clusters was not constant across the range of measurements. Conclusion: Our study highlights the synergistic utility of the anomaly detection approach and traditional PRM in capturing the nuanced heterogeneity of COPD. The observed disparities in spatial patterns, cluster dynamics, and correlations with PFTs underscore the distinct - yet complementary - strengths of these methods. Integrating anomaly detection and PRM offers a promising avenue for understanding of COPD pathophysiology, potentially informing more tailored diagnostic and intervention approaches to improve patient outcomes.

5.
iScience ; 27(2): 109023, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38352223

RESUMO

The preoperative distinction between glioblastoma (GBM) and primary central nervous system lymphoma (PCNSL) can be difficult, even for experts, but is highly relevant. We aimed to develop an easy-to-use algorithm, based on a convolutional neural network (CNN) to preoperatively discern PCNSL from GBM and systematically compare its performance to experienced neurosurgeons and radiologists. To this end, a CNN-based on DenseNet169 was trained with the magnetic resonance (MR)-imaging data of 68 PCNSL and 69 GBM patients and its performance compared to six trained experts on an external test set of 10 PCNSL and 10 GBM. Our neural network predicted PCNSL with an accuracy of 80% and a negative predictive value (NPV) of 0.8, exceeding the accuracy achieved by clinicians (73%, NPV 0.77). Combining expert rating with automated diagnosis in those cases where experts dissented yielded an accuracy of 95%. Our approach has the potential to significantly augment the preoperative radiological diagnosis of PCNSL.

6.
Lancet Oncol ; 25(3): 400-410, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38423052

RESUMO

BACKGROUND: The extended acquisition times required for MRI limit its availability in resource-constrained settings. Consequently, accelerating MRI by undersampling k-space data, which is necessary to reconstruct an image, has been a long-standing but important challenge. We aimed to develop a deep convolutional neural network (dCNN) optimisation method for MRI reconstruction and to reduce scan times and evaluate its effect on image quality and accuracy of oncological imaging biomarkers. METHODS: In this multicentre, retrospective, cohort study, MRI data from patients with glioblastoma treated at Heidelberg University Hospital (775 patients and 775 examinations) and from the phase 2 CORE trial (260 patients, 1083 examinations, and 58 institutions) and the phase 3 CENTRIC trial (505 patients, 3147 examinations, and 139 institutions) were used to develop, train, and test dCNN for reconstructing MRI from highly undersampled single-coil k-space data with various acceleration rates (R=2, 4, 6, 8, 10, and 15). Independent testing was performed with MRIs from the phase 2/3 EORTC-26101 trial (528 patients with glioblastoma, 1974 examinations, and 32 institutions). The similarity between undersampled dCNN-reconstructed and original MRIs was quantified with various image quality metrics, including structural similarity index measure (SSIM) and the accuracy of undersampled dCNN-reconstructed MRI on downstream radiological assessment of imaging biomarkers in oncology (automated artificial intelligence-based quantification of tumour burden and treatment response) was performed in the EORTC-26101 test dataset. The public NYU Langone Health fastMRI brain test dataset (558 patients and 558 examinations) was used to validate the generalisability and robustness of the dCNN for reconstructing MRIs from available multi-coil (parallel imaging) k-space data. FINDINGS: In the EORTC-26101 test dataset, the median SSIM of undersampled dCNN-reconstructed MRI ranged from 0·88 to 0·99 across different acceleration rates, with 0·92 (95% CI 0·92-0·93) for 10-times acceleration (R=10). The 10-times undersampled dCNN-reconstructed MRI yielded excellent agreement with original MRI when assessing volumes of contrast-enhancing tumour (median DICE for spatial agreement of 0·89 [95% CI 0·88 to 0·89]; median volume difference of 0·01 cm3 [95% CI 0·00 to 0·03] equalling 0·21%; p=0·0036 for equivalence) or non-enhancing tumour or oedema (median DICE of 0·94 [95% CI 0·94 to 0·95]; median volume difference of -0·79 cm3 [95% CI -0·87 to -0·72] equalling -1·77%; p=0·023 for equivalence) in the EORTC-26101 test dataset. Automated volumetric tumour response assessment in the EORTC-26101 test dataset yielded an identical median time to progression of 4·27 months (95% CI 4·14 to 4·57) when using 10-times-undersampled dCNN-reconstructed or original MRI (log-rank p=0·80) and agreement in the time to progression in 374 (95·2%) of 393 patients with data. The dCNN generalised well to the fastMRI brain dataset, with significant improvements in the median SSIM when using multi-coil compared with single-coil k-space data (p<0·0001). INTERPRETATION: Deep-learning-based reconstruction of undersampled MRI allows for a substantial reduction of scan times, with a 10-times acceleration demonstrating excellent image quality while preserving the accuracy of derived imaging biomarkers for the assessment of oncological treatment response. Our developments are available as open source software and hold considerable promise for increasing the accessibility to MRI, pending further prospective validation. FUNDING: Deutsche Forschungsgemeinschaft (German Research Foundation) and an Else Kröner Clinician Scientist Endowed Professorship by the Else Kröner Fresenius Foundation.


Assuntos
Aprendizado Profundo , Glioblastoma , Humanos , Inteligência Artificial , Biomarcadores , Estudos de Coortes , Glioblastoma/diagnóstico por imagem , Imageamento por Ressonância Magnética , Estudos Retrospectivos
7.
Radiol Imaging Cancer ; 6(1): e230033, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38180338

RESUMO

Purpose To describe the design, conduct, and results of the Breast Multiparametric MRI for prediction of neoadjuvant chemotherapy Response (BMMR2) challenge. Materials and Methods The BMMR2 computational challenge opened on May 28, 2021, and closed on December 21, 2021. The goal of the challenge was to identify image-based markers derived from multiparametric breast MRI, including diffusion-weighted imaging (DWI) and dynamic contrast-enhanced (DCE) MRI, along with clinical data for predicting pathologic complete response (pCR) following neoadjuvant treatment. Data included 573 breast MRI studies from 191 women (mean age [±SD], 48.9 years ± 10.56) in the I-SPY 2/American College of Radiology Imaging Network (ACRIN) 6698 trial (ClinicalTrials.gov: NCT01042379). The challenge cohort was split into training (60%) and test (40%) sets, with teams blinded to test set pCR outcomes. Prediction performance was evaluated by area under the receiver operating characteristic curve (AUC) and compared with the benchmark established from the ACRIN 6698 primary analysis. Results Eight teams submitted final predictions. Entries from three teams had point estimators of AUC that were higher than the benchmark performance (AUC, 0.782 [95% CI: 0.670, 0.893], with AUCs of 0.803 [95% CI: 0.702, 0.904], 0.838 [95% CI: 0.748, 0.928], and 0.840 [95% CI: 0.748, 0.932]). A variety of approaches were used, ranging from extraction of individual features to deep learning and artificial intelligence methods, incorporating DCE and DWI alone or in combination. Conclusion The BMMR2 challenge identified several models with high predictive performance, which may further expand the value of multiparametric breast MRI as an early marker of treatment response. Clinical trial registration no. NCT01042379 Keywords: MRI, Breast, Tumor Response Supplemental material is available for this article. © RSNA, 2024.


Assuntos
Neoplasias da Mama , Imageamento por Ressonância Magnética Multiparamétrica , Feminino , Humanos , Pessoa de Meia-Idade , Inteligência Artificial , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/tratamento farmacológico , Imageamento por Ressonância Magnética , Terapia Neoadjuvante , Resposta Patológica Completa , Adulto
8.
J Magn Reson Imaging ; 59(4): 1409-1422, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37504495

RESUMO

BACKGROUND: Weakly supervised learning promises reduced annotation effort while maintaining performance. PURPOSE: To compare weakly supervised training with full slice-wise annotated training of a deep convolutional classification network (CNN) for prostate cancer (PC). STUDY TYPE: Retrospective. SUBJECTS: One thousand four hundred eighty-nine consecutive institutional prostate MRI examinations from men with suspicion for PC (65 ± 8 years) between January 2015 and November 2020 were split into training (N = 794, enriched with 204 PROSTATEx examinations) and test set (N = 695). FIELD STRENGTH/SEQUENCE: 1.5 and 3T, T2-weighted turbo-spin-echo and diffusion-weighted echo-planar imaging. ASSESSMENT: Histopathological ground truth was provided by targeted and extended systematic biopsy. Reference training was performed using slice-level annotation (SLA) and compared to iterative training utilizing patient-level annotations (PLAs) with supervised feedback of CNN estimates into the next training iteration at three incremental training set sizes (N = 200, 500, 998). Model performance was assessed by comparing specificity at fixed sensitivity of 0.97 [254/262] emulating PI-RADS ≥ 3, and 0.88-0.90 [231-236/262] emulating PI-RADS ≥ 4 decisions. STATISTICAL TESTS: Receiver operating characteristic (ROC) and area under the curve (AUC) was compared using DeLong and Obuchowski test. Sensitivity and specificity were compared using McNemar test. Statistical significance threshold was P = 0.05. RESULTS: Test set (N = 695) ROC-AUC performance of SLA (trained with 200/500/998 exams) was 0.75/0.80/0.83, respectively. PLA achieved lower ROC-AUC of 0.64/0.72/0.78. Both increased performance significantly with increasing training set size. ROC-AUC for SLA at 500 exams was comparable to PLA at 998 exams (P = 0.28). ROC-AUC was significantly different between SLA and PLA at same training set sizes, however the ROC-AUC difference decreased significantly from 200 to 998 training exams. Emulating PI-RADS ≥ 3 decisions, difference between PLA specificity of 0.12 [51/433] and SLA specificity of 0.13 [55/433] became undetectable (P = 1.0) at 998 exams. Emulating PI-RADS ≥ 4 decisions, at 998 exams, SLA specificity of 0.51 [221/433] remained higher than PLA specificity at 0.39 [170/433]. However, PLA specificity at 998 exams became comparable to SLA specificity of 0.37 [159/433] at 200 exams (P = 0.70). DATA CONCLUSION: Weakly supervised training of a classification CNN using patient-level-only annotation had lower performance compared to training with slice-wise annotations, but improved significantly faster with additional training data. EVIDENCE LEVEL: 3 TECHNICAL EFFICACY: Stage 2.


Assuntos
Aprendizado Profundo , Neoplasias da Próstata , Masculino , Humanos , Imageamento por Ressonância Magnética/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Estudos Retrospectivos , Poliésteres
9.
Schizophr Res ; 263: 160-168, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37236889

RESUMO

The number of magnetic resonance imaging (MRI) studies on neuronal correlates of catatonia has dramatically increased in the last 10 years, but conclusive findings on white matter (WM) tracts alterations underlying catatonic symptoms are still lacking. Therefore, we conduct an interdisciplinary longitudinal MRI study (whiteCAT) with two main objectives: First, we aim to enroll 100 psychiatric patients with and 50 psychiatric patients without catatonia according to ICD-11 who will undergo a deep phenotyping approach with an extensive battery of demographic, psychopathological, psychometric, neuropsychological, instrumental and diffusion MRI assessments at baseline and 12 weeks follow-up. So far, 28 catatonia patients and 40 patients with schizophrenia or other primary psychotic disorders or mood disorders without catatonia have been studied cross-sectionally. 49 out of 68 patients have completed longitudinal assessment, so far. Second, we seek to develop and implement a new method for semi-automatic fiber tract delineation using active learning. By training supportive machine learning algorithms on the fly that are custom tailored to the respective analysis pipeline used to obtain the tractogram as well as the WM tract of interest, we plan to streamline and speed up this tedious and error-prone task while at the same time increasing reproducibility and robustness of the extraction process. The goal is to develop robust neuroimaging biomarkers of symptom severity and therapy outcome based on WM tracts underlying catatonia. If our MRI study is successful, it will be the largest longitudinal study to date that has investigated WM tracts in catatonia patients.


Assuntos
Catatonia , Substância Branca , Humanos , Catatonia/diagnóstico , Substância Branca/diagnóstico por imagem , Substância Branca/patologia , Estudos Longitudinais , Reprodutibilidade dos Testes , Biomarcadores
10.
Sci Rep ; 13(1): 19805, 2023 11 13.
Artigo em Inglês | MEDLINE | ID: mdl-37957250

RESUMO

Prostate cancer (PCa) diagnosis on multi-parametric magnetic resonance images (MRI) requires radiologists with a high level of expertise. Misalignments between the MRI sequences can be caused by patient movement, elastic soft-tissue deformations, and imaging artifacts. They further increase the complexity of the task prompting radiologists to interpret the images. Recently, computer-aided diagnosis (CAD) tools have demonstrated potential for PCa diagnosis typically relying on complex co-registration of the input modalities. However, there is no consensus among research groups on whether CAD systems profit from using registration. Furthermore, alternative strategies to handle multi-modal misalignments have not been explored so far. Our study introduces and compares different strategies to cope with image misalignments and evaluates them regarding to their direct effect on diagnostic accuracy of PCa. In addition to established registration algorithms, we propose 'misalignment augmentation' as a concept to increase CAD robustness. As the results demonstrate, misalignment augmentations can not only compensate for a complete lack of registration, but if used in conjunction with registration, also improve the overall performance on an independent test set.


Assuntos
Próstata , Neoplasias da Próstata , Masculino , Humanos , Próstata/diagnóstico por imagem , Próstata/patologia , Imageamento por Ressonância Magnética/métodos , Diagnóstico por Computador/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Computadores
11.
Eur Radiol ; 33(11): 7463-7476, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37507610

RESUMO

OBJECTIVES: To evaluate a fully automatic deep learning system to detect and segment clinically significant prostate cancer (csPCa) on same-vendor prostate MRI from two different institutions not contributing to training of the system. MATERIALS AND METHODS: In this retrospective study, a previously bi-institutionally validated deep learning system (UNETM) was applied to bi-parametric prostate MRI data from one external institution (A), a PI-RADS distribution-matched internal cohort (B), and a csPCa stratified subset of single-institution external public challenge data (C). csPCa was defined as ISUP Grade Group ≥ 2 determined from combined targeted and extended systematic MRI/transrectal US-fusion biopsy. Performance of UNETM was evaluated by comparing ROC AUC and specificity at typical PI-RADS sensitivity levels. Lesion-level analysis between UNETM segmentations and radiologist-delineated segmentations was performed using Dice coefficient, free-response operating characteristic (FROC), and weighted alternative (waFROC). The influence of using different diffusion sequences was analyzed in cohort A. RESULTS: In 250/250/140 exams in cohorts A/B/C, differences in ROC AUC were insignificant with 0.80 (95% CI: 0.74-0.85)/0.87 (95% CI: 0.83-0.92)/0.82 (95% CI: 0.75-0.89). At sensitivities of 95% and 90%, UNETM achieved specificity of 30%/50% in A, 44%/71% in B, and 43%/49% in C, respectively. Dice coefficient of UNETM and radiologist-delineated lesions was 0.36 in A and 0.49 in B. The waFROC AUC was 0.67 (95% CI: 0.60-0.83) in A and 0.7 (95% CI: 0.64-0.78) in B. UNETM performed marginally better on readout-segmented than on single-shot echo-planar-imaging. CONCLUSION: For same-vendor examinations, deep learning provided comparable discrimination of csPCa and non-csPCa lesions and examinations between local and two independent external data sets, demonstrating the applicability of the system to institutions not participating in model training. CLINICAL RELEVANCE STATEMENT: A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets, indicating the potential of deploying AI models without retraining or fine-tuning, and corroborating evidence that AI models extract a substantial amount of transferable domain knowledge about MRI-based prostate cancer assessment. KEY POINTS: • A previously bi-institutionally validated fully automatic deep learning system maintained acceptable exam-level diagnostic performance in two independent external data sets. • Lesion detection performance and segmentation congruence was similar on the institutional and an external data set, as measured by the weighted alternative FROC AUC and Dice coefficient. • Although the system generalized to two external institutions without re-training, achieving expected sensitivity and specificity levels using the deep learning system requires probability thresholds to be adjusted, underlining the importance of institution-specific calibration and quality control.


Assuntos
Aprendizado Profundo , Neoplasias da Próstata , Masculino , Humanos , Imageamento por Ressonância Magnética , Próstata/diagnóstico por imagem , Próstata/patologia , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/patologia , Estudos Retrospectivos
12.
Bone ; 175: 116857, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37487861

RESUMO

PURPOSE: The presence of bone marrow focal lesions and osteolytic lesions in patients with multiple myeloma (MM) is of high prognostic significance for their individual outcome. It is not known yet why some focal lesions seen in MRI, reflecting localized bone marrow infiltration of myeloma cells, remain non-lytic, whereas others are associated with destruction of mineralized bone. In this study, we analyzed MRI characteristics of manually segmented focal lesions in MM patients to identify possible features that might discriminate lytic and non-lytic lesions. METHOD: The initial cohort included a total of 140 patients with different stages of MM who had undergone both whole-body MRI and whole-body low-dose CT within 30 days, and of which 29 satisfied the inclusion criteria for this study. Focal lesions in MRI and corresponding osteolytic areas in CT were segmented manually. Analysis of the lesions included volume, location and first order texture features analysis. RESULTS: There were significantly more lytic lesions in the axial skeleton than in the appendicular skeleton (p = 0.037). Out of 926 focal lesions in the axial skeleton seen on MRI, 544 (59.3 %) were osteolytic. Analysis of volume and first order texture features showed differences in texture and volume between focal lesions in MRI with and without local bone destruction in CT, but these findings were not statistically significant. CONCLUSIONS: Neither morphological imaging characteristics like size and location nor first order texture features could predict whether focal lesions seen in MRI would exhibit corresponding bone destruction in CT. Studies performing biopsies of such lesions are ongoing.


Assuntos
Mieloma Múltiplo , Humanos , Mieloma Múltiplo/diagnóstico por imagem , Mieloma Múltiplo/patologia , Medula Óssea/diagnóstico por imagem , Medula Óssea/patologia , Tomografia Computadorizada por Raios X , Imageamento por Ressonância Magnética , Prognóstico
13.
Invest Radiol ; 58(10): 754-765, 2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37222527

RESUMO

OBJECTIVES: In multiple myeloma and its precursor stages, plasma cell infiltration (PCI) and cytogenetic aberrations are important for staging, risk stratification, and response assessment. However, invasive bone marrow (BM) biopsies cannot be performed frequently and multifocally to assess the spatially heterogenous tumor tissue. Therefore, the goal of this study was to establish an automated framework to predict local BM biopsy results from magnetic resonance imaging (MRI). MATERIALS AND METHODS: This retrospective multicentric study used data from center 1 for algorithm training and internal testing, and data from center 2 to 8 for external testing. An nnU-Net was trained for automated segmentation of pelvic BM from T1-weighted whole-body MRI. Radiomics features were extracted from these segmentations, and random forest models were trained to predict PCI and the presence or absence of cytogenetic aberrations. Pearson correlation coefficient and the area under the receiver operating characteristic were used to evaluate the prediction performance for PCI and cytogenetic aberrations, respectively. RESULTS: A total of 672 MRIs from 512 patients (median age, 61 years; interquartile range, 53-67 years; 307 men) from 8 centers and 370 corresponding BM biopsies were included. The predicted PCI from the best model was significantly correlated ( P ≤ 0.01) to the actual PCI from biopsy in all internal and external test sets (internal test set: r = 0.71 [0.51, 0.83]; center 2, high-quality test set: r = 0.45 [0.12, 0.69]; center 2, other test set: r = 0.30 [0.07, 0.49]; multicenter test set: r = 0.57 [0.30, 0.76]). The areas under the receiver operating characteristic of the prediction models for the different cytogenetic aberrations ranged from 0.57 to 0.76 for the internal test set, but no model generalized well to all 3 external test sets. CONCLUSIONS: The automated image analysis framework established in this study allows for noninvasive prediction of a surrogate parameter for PCI, which is significantly correlated to the actual PCI from BM biopsy.


Assuntos
Aprendizado Profundo , Mieloma Múltiplo , Masculino , Humanos , Pessoa de Meia-Idade , Mieloma Múltiplo/diagnóstico por imagem , Mieloma Múltiplo/genética , Medula Óssea/diagnóstico por imagem , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Biópsia , Aberrações Cromossômicas
14.
J Clin Med ; 12(7)2023 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-37048730

RESUMO

BACKGROUND: This ex vivo experimental study sought to compare screw planning accuracy of a self-derived deep-learning-based (DL) and a commercial atlas-based (ATL) tool and to assess robustness towards pathologic spinal anatomy. METHODS: From a consecutive registry, 50 cases (256 screws in L1-L5) were randomly selected for experimental planning. Reference screws were manually planned by two independent raters. Additional planning sets were created using the automatic DL and ATL tools. Using Python, automatic planning was compared to the reference in 3D space by calculating minimal absolute distances (MAD) for screw head and tip points (mm) and angular deviation (degree). Results were evaluated for interrater variability of reference screws. Robustness was evaluated in subgroups stratified for alteration of spinal anatomy. RESULTS: Planning was successful in all 256 screws using DL and in 208/256 (81%) using ATL. MAD to the reference for head and tip points and angular deviation was 3.93 ± 2.08 mm, 3.49 ± 1.80 mm and 4.46 ± 2.86° for DL and 7.77 ± 3.65 mm, 7.81 ± 4.75 mm and 6.70 ± 3.53° for ATL, respectively. Corresponding interrater variance for reference screws was 4.89 ± 2.04 mm, 4.36 ± 2.25 mm and 5.27 ± 3.20°, respectively. Planning accuracy was comparable to the manual reference for DL, while ATL produced significantly inferior results (p < 0.0001). DL was robust to altered spinal anatomy while planning failure was pronounced for ATL in 28/82 screws (34%) in the subgroup with severely altered spinal anatomy and alignment (p < 0.0001). CONCLUSIONS: Deep learning appears to be a promising approach to reliable automated screw planning, coping well with anatomic variations of the spine that severely limit the accuracy of ATL systems.

15.
Acta Neurochir (Wien) ; 165(4): 1041-1051, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36862216

RESUMO

PURPOSE: Fiber tracking (FT) is used in neurosurgical planning for the resection of lesions in proximity to fiber pathways, as it contributes to a substantial amelioration of postoperative neurological impairments. Currently, diffusion-tensor imaging (DTI)-based FT is the most frequently used technique; however, sophisticated techniques such as Q-ball (QBI) for high-resolution FT (HRFT) have suggested favorable results. Little is known about the reproducibility of both techniques in the clinical setting. Therefore, this study aimed to examine the intra- and interrater agreement for the depiction of white matter pathways such as the corticospinal tract (CST) and the optic radiation (OR). METHODS: Nineteen patients with eloquent lesions in the proximity of the OR or CST were prospectively enrolled. Two different raters independently reconstructed the fiber bundles by applying probabilistic DTI- and QBI-FT. Interrater agreement was evaluated from the comparison between results obtained by the two raters on the same data set acquired in two independent iterations at different timepoints using the Dice Similarity Coefficient (DSC) and the Jaccard Coefficient (JC). Likewise, intrarater agreement was determined for each rater comparing individual results. RESULTS: DSC values showed substantial intrarater agreement based on DTI-FT (rater 1: mean 0.77 (0.68-0.85); rater 2: mean 0.75 (0.64-0.81); p = 0.673); while an excellent agreement was observed after the deployment of QBI-based FT (rater 1: mean 0.86 (0.78-0.98); rater 2: mean 0.80 (0.72-0.91); p = 0.693). In contrast, fair agreement was observed between both measures for the repeatability of the OR of each rater based on DTI-FT (rater 1: mean 0.36 (0.26-0.77); rater 2: mean 0.40 (0.27-0.79), p = 0.546). A substantial agreement between the measures was noted by applying QBI-FT (rater 1: mean 0.67 (0.44-0.78); rater 2: mean 0.62 (0.32-0.70), 0.665). The interrater agreement was moderate for the reproducibility of the CST and OR for both DSC and JC based on DTI-FT (DSC and JC ≥ 0.40); while a substantial interrater agreement was noted for DSC after applying QBI-based FT for the delineation of both fiber tracts (DSC > 0.6). CONCLUSIONS: Our findings suggest that QBI-based FT might be a more robust tool for the visualization of the OR and CST adjacent to intracerebral lesions compared with the common standard DTI-FT. For neurosurgical planning during the daily workflow, QBI appears to be feasible and less operator-dependent.


Assuntos
Tratos Piramidais , Substância Branca , Humanos , Tratos Piramidais/diagnóstico por imagem , Tratos Piramidais/patologia , Reprodutibilidade dos Testes , Imagem de Tensor de Difusão/métodos , Substância Branca/patologia
17.
Invest Radiol ; 58(4): 273-282, 2023 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-36256790

RESUMO

OBJECTIVES: Diffusion-weighted magnetic resonance imaging (MRI) is increasingly important in patients with multiple myeloma (MM). The objective of this study was to train and test an algorithm for automatic pelvic bone marrow analysis from whole-body apparent diffusion coefficient (ADC) maps in patients with MM, which automatically segments pelvic bones and subsequently extracts objective, representative ADC measurements from each bone. MATERIALS AND METHODS: In this retrospective multicentric study, 180 MRIs from 54 patients were annotated (semi)manually and used to train an nnU-Net for automatic, individual segmentation of the right hip bone, the left hip bone, and the sacral bone. The quality of the automatic segmentation was evaluated on 15 manually segmented whole-body MRIs from 3 centers using the dice score. In 3 independent test sets from 3 centers, which comprised a total of 312 whole-body MRIs, agreement between automatically extracted mean ADC values from the nnU-Net segmentation and manual ADC measurements from 2 independent radiologists was evaluated. Bland-Altman plots were constructed, and absolute bias, relative bias to mean, limits of agreement, and coefficients of variation were calculated. In 56 patients with newly diagnosed MM who had undergone bone marrow biopsy, ADC measurements were correlated with biopsy results using Spearman correlation. RESULTS: The ADC-nnU-Net achieved automatic segmentations with mean dice scores of 0.92, 0.93, and 0.85 for the right pelvis, the left pelvis, and the sacral bone, whereas the interrater experiment gave mean dice scores of 0.86, 0.86, and 0.77, respectively. The agreement between radiologists' manual ADC measurements and automatic ADC measurements was as follows: the bias between the first reader and the automatic approach was 49 × 10 -6 mm 2 /s, 7 × 10 -6 mm 2 /s, and -58 × 10 -6 mm 2 /s, and the bias between the second reader and the automatic approach was 12 × 10 -6 mm 2 /s, 2 × 10 -6 mm 2 /s, and -66 × 10 -6 mm 2 /s for the right pelvis, the left pelvis, and the sacral bone, respectively. The bias between reader 1 and reader 2 was 40 × 10 -6 mm 2 /s, 8 × 10 -6 mm 2 /s, and 7 × 10 -6 mm 2 /s, and the mean absolute difference between manual readers was 84 × 10 -6 mm 2 /s, 65 × 10 -6 mm 2 /s, and 75 × 10 -6 mm 2 /s. Automatically extracted ADC values significantly correlated with bone marrow plasma cell infiltration ( R = 0.36, P = 0.007). CONCLUSIONS: In this study, a nnU-Net was trained that can automatically segment pelvic bone marrow from whole-body ADC maps in multicentric data sets with a quality comparable to manual segmentations. This approach allows automatic, objective bone marrow ADC measurements, which agree well with manual ADC measurements and can help to overcome interrater variability or nonrepresentative measurements. Automatically extracted ADC values significantly correlate with bone marrow plasma cell infiltration and might be of value for automatic staging, risk stratification, or therapy response assessment.


Assuntos
Aprendizado Profundo , Mieloma Múltiplo , Humanos , Imageamento por Ressonância Magnética/métodos , Mieloma Múltiplo/diagnóstico por imagem , Mieloma Múltiplo/patologia , Medula Óssea/diagnóstico por imagem , Estudos Retrospectivos , Imagem Corporal Total/métodos , Imagem de Difusão por Ressonância Magnética/métodos
18.
Neuro Oncol ; 25(3): 533-543, 2023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-35917833

RESUMO

BACKGROUND: To assess whether artificial intelligence (AI)-based decision support allows more reproducible and standardized assessment of treatment response on MRI in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden using the Response Assessment in Neuro-Oncology (RANO) criteria. METHODS: A series of 30 patients (15 lower-grade gliomas, 15 glioblastoma) with availability of consecutive MRI scans was selected. The time to progression (TTP) on MRI was separately evaluated for each patient by 15 investigators over two rounds. In the first round the TTP was evaluated based on the RANO criteria, whereas in the second round the TTP was evaluated by incorporating additional information from AI-enhanced MRI sequences depicting the longitudinal changes in tumor volumes. The agreement of the TTP measurements between investigators was evaluated using concordance correlation coefficients (CCC) with confidence intervals (CI) and P-values obtained using bootstrap resampling. RESULTS: The CCC of TTP-measurements between investigators was 0.77 (95% CI = 0.69,0.88) with RANO alone and increased to 0.91 (95% CI = 0.82,0.95) with AI-based decision support (P = .005). This effect was significantly greater (P = .008) for patients with lower-grade gliomas (CCC = 0.70 [95% CI = 0.56,0.85] without vs. 0.90 [95% CI = 0.76,0.95] with AI-based decision support) as compared to glioblastoma (CCC = 0.83 [95% CI = 0.75,0.92] without vs. 0.86 [95% CI = 0.78,0.93] with AI-based decision support). Investigators with less years of experience judged the AI-based decision as more helpful (P = .02). CONCLUSIONS: AI-based decision support has the potential to yield more reproducible and standardized assessment of treatment response in neuro-oncology as compared to manual 2-dimensional measurements of tumor burden, particularly in patients with lower-grade gliomas. A fully-functional version of this AI-based processing pipeline is provided as open-source (https://github.com/NeuroAI-HD/HD-GLIO-XNAT).


Assuntos
Neoplasias Encefálicas , Glioblastoma , Glioma , Humanos , Glioblastoma/patologia , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/terapia , Neoplasias Encefálicas/patologia , Inteligência Artificial , Reprodutibilidade dos Testes , Glioma/diagnóstico por imagem , Glioma/terapia , Glioma/patologia
19.
Nat Commun ; 13(1): 7346, 2022 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-36470898

RESUMO

Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing.


Assuntos
Big Data , Glioblastoma , Humanos , Aprendizado de Máquina , Doenças Raras , Disseminação de Informação
20.
Radiol Artif Intell ; 4(5): e220055, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36204531

RESUMO

Purpose: To train a deep natural language processing (NLP) model, using data mined structured oncology reports (SOR), for rapid tumor response category (TRC) classification from free-text oncology reports (FTOR) and to compare its performance with human readers and conventional NLP algorithms. Materials and Methods: In this retrospective study, databases of three independent radiology departments were queried for SOR and FTOR dated from March 2018 to August 2021. An automated data mining and curation pipeline was developed to extract Response Evaluation Criteria in Solid Tumors-related TRCs for SOR for ground truth definition. The deep NLP bidirectional encoder representations from transformers (BERT) model and three feature-rich algorithms were trained on SOR to predict TRCs in FTOR. Models' F1 scores were compared against scores of radiologists, medical students, and radiology technologist students. Lexical and semantic analyses were conducted to investigate human and model performance on FTOR. Results: Oncologic findings and TRCs were accurately mined from 9653 of 12 833 (75.2%) queried SOR, yielding oncology reports from 10 455 patients (mean age, 60 years ± 14 [SD]; 5303 women) who met inclusion criteria. On 802 FTOR in the test set, BERT achieved better TRC classification results (F1, 0.70; 95% CI: 0.68, 0.73) than the best-performing reference linear support vector classifier (F1, 0.63; 95% CI: 0.61, 0.66) and technologist students (F1, 0.65; 95% CI: 0.63, 0.67), had similar performance to medical students (F1, 0.73; 95% CI: 0.72, 0.75), but was inferior to radiologists (F1, 0.79; 95% CI: 0.78, 0.81). Lexical complexity and semantic ambiguities in FTOR influenced human and model performance, revealing maximum F1 score drops of -0.17 and -0.19, respectively. Conclusion: The developed deep NLP model reached the performance level of medical students but not radiologists in curating oncologic outcomes from radiology FTOR.Keywords: Neural Networks, Computer Applications-Detection/Diagnosis, Oncology, Research Design, Staging, Tumor Response, Comparative Studies, Decision Analysis, Experimental Investigations, Observer Performance, Outcomes Analysis Supplemental material is available for this article. © RSNA, 2022.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA