RESUMO
OBJECTIVE: To evaluate the effectiveness of a self-adapting deep network, trained on large-scale bi-parametric MRI data, in detecting clinically significant prostate cancer (csPCa) in external multi-center data from men of diverse demographics; to investigate the advantages of transfer learning. METHODS: We used two samples: (i) Publicly available multi-center and multi-vendor Prostate Imaging: Cancer AI (PI-CAI) training data, consisting of 1500 bi-parametric MRI scans, along with its unseen validation and testing samples; (ii) In-house multi-center testing and transfer learning data, comprising 1036 and 200 bi-parametric MRI scans. We trained a self-adapting 3D nnU-Net model using probabilistic prostate masks on the PI-CAI data and evaluated its performance on the hidden validation and testing samples and the in-house data with and without transfer learning. We used the area under the receiver operating characteristic (AUROC) curve to evaluate patient-level performance in detecting csPCa. RESULTS: The PI-CAI training data had 425 scans with csPCa, while the in-house testing and fine-tuning data had 288 and 50 scans with csPCa, respectively. The nnU-Net model achieved an AUROC of 0.888 and 0.889 on the hidden validation and testing data. The model performed with an AUROC of 0.886 on the in-house testing data, with a slight decrease in performance to 0.870 using transfer learning. CONCLUSIONS: The state-of-the-art deep learning method using prostate masks trained on large-scale bi-parametric MRI data provides high performance in detecting csPCa in internal and external testing data with different characteristics, demonstrating the robustness and generalizability of deep learning within and across datasets. CLINICAL RELEVANCE STATEMENT: A self-adapting deep network, utilizing prostate masks and trained on large-scale bi-parametric MRI data, is effective in accurately detecting clinically significant prostate cancer across diverse datasets, highlighting the potential of deep learning methods for improving prostate cancer detection in clinical practice.
RESUMO
The use of deep learning (DL) techniques for automated diagnosis of large vessel occlusion (LVO) and collateral scoring on computed tomography angiography (CTA) is gaining attention. In this study, a state-of-the-art self-configuring object detection network called nnDetection was used to detect LVO and assess collateralization on CTA scans using a multi-task 3D object detection approach. The model was trained on single-phase CTA scans of 2425 patients at five centers, and its performance was evaluated on an external test set of 345 patients from another center. Ground-truth labels for the presence of LVO and collateral scores were provided by three radiologists. The nnDetection model achieved a diagnostic accuracy of 98.26% (95% CI 96.25-99.36%) in identifying LVO, correctly classifying 339 out of 345 CTA scans in the external test set. The DL-based collateral scores had a kappa of 0.80, indicating good agreement with the consensus of the radiologists. These results demonstrate that the self-configuring 3D nnDetection model can accurately detect LVO on single-phase CTA scans and provide semi-quantitative collateral scores, offering a comprehensive approach for automated stroke diagnostics in patients with LVO.
Assuntos
Isquemia Encefálica , Acidente Vascular Cerebral , Humanos , Angiografia por Tomografia Computadorizada/métodos , Acidente Vascular Cerebral/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Artéria Cerebral Média , Estudos Retrospectivos , Angiografia Cerebral/métodosRESUMO
To investigate the performance of a joint convolutional neural networks-recurrent neural networks (CNN-RNN) using an attention mechanism in identifying and classifying intracranial hemorrhage (ICH) on a large multi-center dataset; to test its performance in a prospective independent sample consisting of consecutive real-world patients. All consecutive patients who underwent emergency non-contrast-enhanced head CT in five different centers were retrospectively gathered. Five neuroradiologists created the ground-truth labels. The development dataset was divided into the training and validation set. After the development phase, we integrated the deep learning model into an independent center's PACS environment for over six months for assessing the performance in a real clinical setting. Three radiologists created the ground-truth labels of the testing set with a majority voting. A total of 55,179 head CT scans of 48,070 patients, 28,253 men (58.77%), with a mean age of 53.84 ± 17.64 years (range 18-89) were enrolled in the study. The validation sample comprised 5211 head CT scans, with 991 being annotated as ICH-positive. The model's binary accuracy, sensitivity, and specificity on the validation set were 99.41%, 99.70%, and 98.91, respectively. During the prospective implementation, the model yielded an accuracy of 96.02% on 452 head CT scans with an average prediction time of 45 ± 8 s. The joint CNN-RNN model with an attention mechanism yielded excellent diagnostic accuracy in assessing ICH and its subtypes on a large-scale sample. The model was seamlessly integrated into the radiology workflow. Though slightly decreased performance, it provided decisions on the sample of consecutive real-world patients within a minute.
Assuntos
Aprendizado Profundo , Hemorragia Intracraniana Traumática/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Serviço Hospitalar de Emergência , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Estudos Retrospectivos , Adulto JovemRESUMO
There is little evidence on the applicability of deep learning (DL) in the segmentation of acute ischemic lesions on diffusion-weighted imaging (DWI) between magnetic resonance imaging (MRI) scanners of different manufacturers. We retrospectively included DWI data of patients with acute ischemic lesions from six centers. Dataset A (n = 2986) and B (n = 3951) included data from Siemens and GE MRI scanners, respectively. The datasets were split into the training (80%), validation (10%), and internal test (10%) sets, and six neuroradiologists created ground-truth masks. Models A and B were the proposed neural networks trained on datasets A and B. The models subsequently fine-tuned across the datasets using their validation data. Another radiologist performed the segmentation on the test sets for comparisons. The median Dice scores of models A and B were 0.858 and 0.857 for the internal tests, which were non-inferior to the radiologist's performance, but demonstrated lower performance than the radiologist on the external tests. Fine-tuned models A and B achieved median Dice scores of 0.832 and 0.846, which were non-inferior to the radiologist's performance on the external tests. The present work shows that the inter-vendor operability of deep learning for the segmentation of ischemic lesions on DWI might be enhanced via transfer learning; thereby, their clinical applicability and generalizability could be improved.
Assuntos
Aprendizado Profundo/estatística & dados numéricos , Imagem de Difusão por Ressonância Magnética/instrumentação , Interpretação de Imagem Assistida por Computador/instrumentação , AVC Isquêmico/diagnóstico , Radiologistas/estatística & dados numéricos , Idoso , Idoso de 80 Anos ou mais , Encéfalo/diagnóstico por imagem , Conjuntos de Dados como Assunto , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/estatística & dados numéricos , Masculino , Pessoa de Meia-Idade , Estudos RetrospectivosRESUMO
OBJECTIVES: The cardiac cycle might impair the reproducibility of radiomics features of cardiac magnetic resonance (CMR) cine images, yet this issue has not been addressed in the previous research. We aim to evaluate whether radiomics features of CMR cine images vary during the cardiac cycle and investigate the reproducibility of radiomics features of CMR cine images. METHODS: This retrospective study enrolled 59 healthy adults who underwent CMR examination. Two observers segmented the myocardium on a 4D stack of three consecutive mid-ventricular short-axis cine images covering the cardiac cycle. A total of 352 radiomics features were extracted. The coefficient of variation and intraclass correlation coefficient were used to assess the feature variability through the cycle and inter-observer reproducibility, respectively. RESULTS: Approximately 55% of radiomics features showed large variability through the cardiac cycle. The original features showed more variability than the Laplacian of Gaussian-filtered features (73.8% vs. 48%). The features of 4D stack cine images had a higher proportion of reproducible features (92.0%, 87.7%, and 76.1%) compared with the end-diastolic (77.8%, 62.2%, and 41.7%) and the end-systolic images (81.5%, 74.1%, and 58.8%) for intraclass correlation cut-off values of 30.80, > 0.85, and > 0.90, respectively. CONCLUSIONS: Radiomics features of CMR cine images greatly vary during the cardiac cycle. The radiomics features of 4D stack of cine images are more robust compared with end-diastolic and end-systolic cine images in terms of reproducibility. The impact of the cardiac cycle on the reproducibility of the features should be considered when employing CMR cine images radiomics. KEY POINTS: ⢠There is limited evidence on the impact of cardiac motion on radiomics features of CMR cine images and the reproducibility of the radiomics features of CMR cine images. ⢠Radiomics features of non-enhanced CMR cine images greatly vary during the cardiac cycle, and the number of "reproducible" features shows significant variations according to the cardiac phases. ⢠The impact of cardiac cycle on the reproducibility of the radiomics features should be considered when employing CMR cine images radiomics.
Assuntos
Ventrículos do Coração , Imagem Cinética por Ressonância Magnética , Adulto , Coração/diagnóstico por imagem , Ventrículos do Coração/diagnóstico por imagem , Humanos , Variações Dependentes do Observador , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
PURPOSE: To assess the performance of texture analysis of conventional magnetic resonance imaging (MRI) and apparent diffusion coefficient (ADC) maps in predicting IDH1 status in high-grade gliomas (HGG). MATERIALS AND METHODS: A total of 142 patients with HGG were included in the study. IDH1 mutation was present in 48 of 142 HGG (33.8%). Patients were randomly divided into the training cohort (n = 96) and the validation cohort (n = 46). Texture features were extracted via regions of interest on axial T2WI FLAIR, post-contrast T1WI, and ADC maps covering the whole volume of the tumors. The training cohort was used to train the random forest classifier, and the diagnostic performance of the pre-trained model was tested on the validation cohort. RESULTS: The random forest model of conventional MRI sequences and ADC images achieved diagnostic accuracy of 82.2% and 80.4% in predicting IDH1 status in the validation cohorts, respectively. The combined model of T2WI FLAIR, post-contrast T1WI, and ADC images exhibited the highest diagnostic accuracy equating 86.94% in the validation cohort. CONCLUSION: Texture analysis of conventional MRI sequences enhanced by ML analysis can accurately predict the IDH1 status of HGG. Adding textural analysis of ADC maps to conventional MRI results in incremental diagnostic performance.
Assuntos
Neoplasias Encefálicas/diagnóstico por imagem , Glioma/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Isocitrato Desidrogenase/genética , Aprendizado de Máquina , Imageamento por Ressonância Magnética/métodos , Adulto , Encéfalo/diagnóstico por imagem , Encéfalo/patologia , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patologia , Feminino , Glioma/genética , Glioma/patologia , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Estudos RetrospectivosRESUMO
AIM: The aim of this study is to evaluate the diagnostic value of machine learning- (ML-) based quantitative texture analysis in the differentiation of benign and malignant thyroid nodules. MATERIALS AND METHODS: A sum of 306 quantitative textural features of 235 thyroid nodules (102 malignant, 43.4%; 133 benign, 56.4%) of a total of 198 patients were investigated using the random forest ML classifier. Feature selection and dimension reduction were conducted using reproducibility testing and a wrapper method. The diagnostic accuracy, sensitivity, specificity, and area under curve (AUC) of the proposed method were compared with the histopathological or cytopathological findings as reference methods. RESULTS: Of the 306 initial texture features, 284 (92.2%) showed good reproducibility (intraclass correlation ≥0.80). The random forest classifier accurately identified 87 out of 102 malignant thyroid nodules and 117 out of 133 benign thyroid nodules, which is a diagnostic sensitivity of 85.2%, specificity of 87.9%, and accuracy of 86.8%. The AUC of the model was 0.92. CONCLUSIONS: Quantitative textural analysis of thyroid nodules using ML classification can accurately discriminate benign and malignant thyroid nodules. Our findings should be validated by multicenter prospective studies using completely independent external data.