RESUMEN
PURPOSE: This study aims to seek an optimized deep learning model for differentiating non-traumatic brachial plexopathy from routine MRI scans. MATERIALS AND METHODS: This retrospective study collected patients through the electronic medical records (EMR) or pathological reports at Mayo Clinic and underwent BP MRI from January 2002 to December 2022. Using sagittal T1, fluid-sensitive and post-gadolinium images, a radiology panel selected BP's region of interest (ROI) to form 3 dimensional volumes for this study. We designed six deep learning schemes to conduct BP abnormality differentiation across three MRI sequences. Utilizing five prestigious deep learning networks as the backbone, we trained and validated these models by nested five-fold cross-validation schemes. Furthermore, we defined a 'method score' derived from the radar charts as a quantitative indicator as the guidance of the preference of the best model. RESULTS: This study selected 196 patients from initial 267 candidates. A total of 256 BP MRI series were compiled from them, comprising 123 normal and 133 abnormal series. The abnormal series included 4 sub-categories, et al. breast cancer (22.5 %), lymphoma (27.1 %), inflammatory conditions (33.1 %) and others (17.2 %). The best-performing model was produced by feature merging mode with triple MRI joint strategy (AUC, 92.2 %; accuracy, 89.5 %) exceeding the multiple channel merging mode (AUC, 89.6 %; accuracy, 89.0 %), solo channel volume mode (AUC, 89.2 %; accuracy, 86.7 %) and the remaining. Evaluated by method score (maximum 2.37), the feature merging mode with backbone of VGG16 yielded the highest score of 1.75 under the triple MRI joint strategy. CONCLUSION: Deployment of deep learning models across sagittal T1, fluid-sensitive and post-gadolinium MRI sequences demonstrated great potential for brachial plexopathy diagnosis. Our findings indicate that utilizing feature merging mode and multiple MRI joint strategy may offer satisfied deep learning model for BP abnormalities than solo-sequence analysis.
RESUMEN
BACKGROUND/OBJECTIVES: The clinical utility of body composition in the development of complications of acute pancreatitis (AP) remains unclear. We aimed to describe the associations between body composition and the recurrence of AP. METHODS: We performed a retrospective study of patients hospitalized with AP at three tertiary care centers. Patients with computer tomography (CT) imaging of the abdomen at admission were included. A previously validated and fully automated abdominal segmentation algorithm was used for body composition analysis. Hospitalization for a recurrent episode of AP was the primary endpoint. Secondary endpoints included the development of chronic pancreatitis (CP) or diabetes mellitus (DM) in patients who were evaluated. Cox Proportional Hazards regression was used. RESULTS: From a total of 347 patients, 89 (25.6%) were hospitalized for recurrent AP (median time: 219 days). Thirty-four of 112 patients (30.4%) developed CP (median time: 311 days) and 22 of 88 (25.0%) developed DM (median time: 1104 days). After adjusting for age, male sex, first episode of AP, BUN, and severity of AP, we found that obesity, body mass index, alcohol pancreatitis, and gallstone pancreatitis were significantly associated with a recurrent episode of AP. Body composition was not associated with recurrent AP. In unadjusted analysis, subcutaneous adipose tissue (SAT) (HR 0.87 per 10 cm2, p = 0.002) was associated with CP. Skeletal muscle (SM) mass approached significance for CP (p = 0.0546). Intermuscular adipose tissue (IMAT) (HR 1.45 per 5 cm2, p = 0.0264) was associated with DM. CONCLUSION: Body composition was not associated with having a recurrent AP. At follow-up, 30% and 25% of evaluated patients developed CP and DM, respectively. A higher SAT and IMAT were associated with a lower incidence of CP and higher incidence of DM, respectively.
RESUMEN
Background Multiparametric MRI can help identify clinically significant prostate cancer (csPCa) (Gleason score ≥7) but is limited by reader experience and interobserver variability. In contrast, deep learning (DL) produces deterministic outputs. Purpose To develop a DL model to predict the presence of csPCa by using patient-level labels without information about tumor location and to compare its performance with that of radiologists. Materials and Methods Data from patients without known csPCa who underwent MRI from January 2017 to December 2019 at one of multiple sites of a single academic institution were retrospectively reviewed. A convolutional neural network was trained to predict csPCa from T2-weighted images, diffusion-weighted images, apparent diffusion coefficient maps, and T1-weighted contrast-enhanced images. The reference standard was pathologic diagnosis. Radiologist performance was evaluated as follows: Radiology reports were used for the internal test set, and four radiologists' PI-RADS ratings were used for the external (ProstateX) test set. The performance was compared using areas under the receiver operating characteristic curves (AUCs) and the DeLong test. Gradient-weighted class activation maps (Grad-CAMs) were used to show tumor localization. Results Among 5735 examinations in 5215 patients (mean age, 66 years ± 8 [SD]; all male), 1514 examinations (1454 patients) showed csPCa. In the internal test set (400 examinations), the AUC was 0.89 and 0.89 for the DL classifier and radiologists, respectively (P = .88). In the external test set (204 examinations), the AUC was 0.86 and 0.84 for the DL classifier and radiologists, respectively (P = .68). DL classifier plus radiologists had an AUC of 0.89 (P < .001). Grad-CAMs demonstrated activation over the csPCa lesion in 35 of 38 and 56 of 58 true-positive examinations in internal and external test sets, respectively. Conclusion The performance of a DL model was not different from that of radiologists in the detection of csPCa at MRI, and Grad-CAMs localized the tumor. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Johnson and Chandarana in this issue.
Asunto(s)
Aprendizaje Profundo , Imagen por Resonancia Magnética , Neoplasias de la Próstata , Masculino , Humanos , Neoplasias de la Próstata/diagnóstico por imagen , Estudios Retrospectivos , Anciano , Persona de Mediana Edad , Imagen por Resonancia Magnética/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imágenes de Resonancia Magnética Multiparamétrica/métodos , Próstata/diagnóstico por imagen , Próstata/patologíaRESUMEN
PURPOSE: To develop a deep learning (DL) zonal segmentation model of prostate MR from T2-weighted images and evaluate TZ-PSAD for prediction of the presence of csPCa (Gleason score of 7 or higher) compared to PSAD. METHODS: 1020 patients with a prostate MRI were randomly selected to develop a DL zonal segmentation model. Test dataset included 20 cases in which 2 radiologists manually segmented both the peripheral zone (PZ) and TZ. Pair-wise Dice index was calculated for each zone. For the prediction of csPCa using PSAD and TZ-PSAD, we used 3461 consecutive MRI exams performed in patients without a history of prostate cancer, with pathological confirmation and available PSA values, but not used in the development of the segmentation model as internal test set and 1460 MRI exams from PI-CAI challenge as external test set. PSAD and TZ-PSAD were calculated from the segmentation model output. The area under the receiver operating curve (AUC) was compared between PSAD and TZ-PSAD using univariate and multivariate analysis (adjusts age) with the DeLong test. RESULTS: Dice scores of the model against two radiologists were 0.87/0.87 and 0.74/0.72 for TZ and PZ, while those between the two radiologists were 0.88 for TZ and 0.75 for PZ. For the prediction of csPCa, the AUCs of TZPSAD were significantly higher than those of PSAD in both internal test set (univariate analysis, 0.75 vs. 0.73, p < 0.001; multivariate analysis, 0.80 vs. 0.78, p < 0.001) and external test set (univariate analysis, 0.76 vs. 0.74, p < 0.001; multivariate analysis, 0.77 vs. 0.75, p < 0.001 in external test set). CONCLUSION: DL model-derived zonal segmentation facilitates the practical measurement of TZ-PSAD and shows it to be a slightly better predictor of csPCa compared to the conventional PSAD. Use of TZ-PSAD may increase the sensitivity of detecting csPCa by 2-5% for a commonly used specificity level.
Asunto(s)
Aprendizaje Profundo , Imagen por Resonancia Magnética , Antígeno Prostático Específico , Neoplasias de la Próstata , Humanos , Masculino , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Imagen por Resonancia Magnética/métodos , Anciano , Persona de Mediana Edad , Antígeno Prostático Específico/sangre , Valor Predictivo de las Pruebas , Clasificación del Tumor , Interpretación de Imagen Asistida por Computador/métodos , Estudios Retrospectivos , Próstata/diagnóstico por imagenRESUMEN
OBJECTIVES: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes. MATERIALS AND METHODS: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists. RESULTS: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively. CONCLUSION: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes. CLINICAL RELEVANCE STATEMENT: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI. KEY POINTS: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.
RESUMEN
Automated segmentation tools often encounter accuracy and adaptability issues when applied to images of different pathology. The purpose of this study is to explore the feasibility of building a workflow to efficiently route images to specifically trained segmentation models. By implementing a deep learning classifier to automatically classify the images and route them to appropriate segmentation models, we hope that our workflow can segment the images with different pathology accurately. The data we used in this study are 350 CT images from patients affected by polycystic liver disease and 350 CT images from patients presenting with liver metastases from colorectal cancer. All images had the liver manually segmented by trained imaging analysts. Our proposed adaptive segmentation workflow achieved a statistically significant improvement for the task of total liver segmentation compared to the generic single-segmentation model (non-parametric Wilcoxon signed rank test, n = 100, p-value << 0.001). This approach is applicable in a wide range of scenarios and should prove useful in clinical implementations of segmentation pipelines.
RESUMEN
Automatic abnormality identification of brachial plexus (BP) from normal magnetic resonance imaging to localize and identify a neurologic injury in clinical practice (MRI) is still a novel topic in brachial plexopathy. This study developed and evaluated an approach to differentiate abnormal BP with artificial intelligence (AI) over three commonly used MRI sequences, i.e. T1, FLUID sensitive and post-gadolinium sequences. A BP dataset was collected by radiological experts and a semi-supervised artificial intelligence method was used to segment the BP (based on nnU-net). Hereafter, a radiomics method was utilized to extract 107 shape and texture features from these ROIs. From various machine learning methods, we selected six widely recognized classifiers for training our Brachial plexus (BP) models and assessing their efficacy. To optimize these models, we introduced a dynamic feature selection approach aimed at discarding redundant and less informative features. Our experimental findings demonstrated that, in the context of identifying abnormal BP cases, shape features displayed heightened sensitivity compared to texture features. Notably, both the Logistic classifier and Bagging classifier outperformed other methods in our study. These evaluations illuminated the exceptional performance of our model trained on FLUID-sensitive sequences, which notably exceeded the results of both T1 and post-gadolinium sequences. Crucially, our analysis highlighted that both its classification accuracies and AUC score (area under the curve of receiver operating characteristics) over FLUID-sensitive sequence exceeded 90%. This outcome served as a robust experimental validation, affirming the substantial potential and strong feasibility of integrating AI into clinical practice.
Asunto(s)
Inteligencia Artificial , Plexo Braquial , Imagen por Resonancia Magnética , Humanos , Imagen por Resonancia Magnética/métodos , Plexo Braquial/diagnóstico por imagen , Neuropatías del Plexo Braquial/diagnóstico por imagen , Aprendizaje Automático , Femenino , Masculino , AdultoRESUMEN
PURPOSE: To evaluate robustness of a radiomics-based support vector machine (SVM) model for detection of visually occult PDA on pre-diagnostic CTs by simulating common variations in image acquisition and radiomics workflow using image perturbation methods. METHODS: Eighteen algorithmically generated-perturbations, which simulated variations in image noise levels (σ, 2σ, 3σ, 5σ), image rotation [both CT image and the corresponding pancreas segmentation mask by 45° and 90° in axial plane], voxel resampling (isotropic and anisotropic), gray-level discretization [bin width (BW) 32 and 64)], and pancreas segmentation (sequential erosions by 3, 4, 6, and 8 pixels and dilations by 3, 4, and 6 pixels from the boundary), were introduced to the original (unperturbed) test subset (n = 128; 45 pre-diagnostic CTs, 83 control CTs with normal pancreas). Radiomic features were extracted from pancreas masks of these additional test subsets, and the model's performance was compared vis-a-vis the unperturbed test subset. RESULTS: The model correctly classified 43 out of 45 pre-diagnostic CTs and 75 out of 83 control CTs in the unperturbed test subset, achieving 92.2% accuracy and 0.98 AUC. Model's performance was unaffected by a three-fold increase in noise level except for sensitivity declining to 80% at 3σ (p = 0.02). Performance remained comparable vis-a-vis the unperturbed test subset despite variations in image rotation (p = 0.99), voxel resampling (p = 0.25-0.31), change in gray-level BW to 32 (p = 0.31-0.99), and erosions/dilations up to 4 pixels from the pancreas boundary (p = 0.12-0.34). CONCLUSION: The model's high performance for detection of visually occult PDA was robust within a broad range of clinically relevant variations in image acquisition and radiomics workflow.
Asunto(s)
Adenocarcinoma , Neoplasias Pancreáticas , Resiliencia Psicológica , Humanos , Adenocarcinoma/diagnóstico por imagen , Neoplasias Pancreáticas/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Radiómica , Flujo de Trabajo , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Estudios RetrospectivosRESUMEN
Introduction: Methods that automatically flag poor performing predictions are drastically needed to safely implement machine learning workflows into clinical practice as well as to identify difficult cases during model training. Methods: Disagreement between the fivefold cross-validation sub-models was quantified using dice scores between folds and summarized as a surrogate for model confidence. The summarized Interfold Dices were compared with thresholds informed by human interobserver values to determine whether final ensemble model performance should be manually reviewed. Results: The method on all tasks efficiently flagged poor segmented images without consulting a reference standard. Using the median Interfold Dice for comparison, substantial dice score improvements after excluding flagged images was noted for the in-domain CT (0.85 ± 0.20 to 0.91 ± 0.08, 8/50 images flagged) and MR (0.76 ± 0.27 to 0.85 ± 0.09, 8/50 images flagged). Most impressively, there were dramatic dice score improvements in the simulated out-of-distribution task where the model was trained on a radical nephrectomy dataset with different contrast phases predicting a partial nephrectomy all cortico-medullary phase dataset (0.67 ± 0.36 to 0.89 ± 0.10, 122/300 images flagged). Discussion: Comparing interfold sub-model disagreement against human interobserver values is an effective and efficient way to assess automated predictions when a reference standard is not available. This functionality provides a necessary safeguard to patient care important to safely implement automated medical image segmentation workflows.
RESUMEN
BACKGROUND & AIMS: The aims of our case-control study were (1) to develop an automated 3-dimensional (3D) Convolutional Neural Network (CNN) for detection of pancreatic ductal adenocarcinoma (PDA) on diagnostic computed tomography scans (CTs), (2) evaluate its generalizability on multi-institutional public data sets, (3) its utility as a potential screening tool using a simulated cohort with high pretest probability, and (4) its ability to detect visually occult preinvasive cancer on prediagnostic CTs. METHODS: A 3D-CNN classification system was trained using algorithmically generated bounding boxes and pancreatic masks on a curated data set of 696 portal phase diagnostic CTs with PDA and 1080 control images with a nonneoplastic pancreas. The model was evaluated on (1) an intramural hold-out test subset (409 CTs with PDA, 829 controls); (2) a simulated cohort with a case-control distribution that matched the risk of PDA in glycemically defined new-onset diabetes, and Enriching New-Onset Diabetes for Pancreatic Cancer score ≥3; (3) multi-institutional public data sets (194 CTs with PDA, 80 controls), and (4) a cohort of 100 prediagnostic CTs (i.e., CTs incidentally acquired 3-36 months before clinical diagnosis of PDA) without a focal mass, and 134 controls. RESULTS: Of the CTs in the intramural test subset, 798 (64%) were from other hospitals. The model correctly classified 360 CTs (88%) with PDA and 783 control CTs (94%), with a mean accuracy 0.92 (95% CI, 0.91-0.94), area under the receiver operating characteristic (AUROC) curve of 0.97 (95% CI, 0.96-0.98), sensitivity of 0.88 (95% CI, 0.85-0.91), and specificity of 0.95 (95% CI, 0.93-0.96). Activation areas on heat maps overlapped with the tumor in 350 of 360 CTs (97%). Performance was high across tumor stages (sensitivity of 0.80, 0.87, 0.95, and 1.0 on T1 through T4 stages, respectively), comparable for hypodense vs isodense tumors (sensitivity: 0.90 vs 0.82), different age, sex, CT slice thicknesses, and vendors (all P > .05), and generalizable on both the simulated cohort (accuracy, 0.95 [95% 0.94-0.95]; AUROC curve, 0.97 [95% CI, 0.94-0.99]) and public data sets (accuracy, 0.86 [95% CI, 0.82-0.90]; AUROC curve, 0.90 [95% CI, 0.86-0.95]). Despite being exclusively trained on diagnostic CTs with larger tumors, the model could detect occult PDA on prediagnostic CTs (accuracy, 0.84 [95% CI, 0.79-0.88]; AUROC curve, 0.91 [95% CI, 0.86-0.94]; sensitivity, 0.75 [95% CI, 0.67-0.84]; and specificity, 0.90 [95% CI, 0.85-0.95]) at a median 475 days (range, 93-1082 days) before clinical diagnosis. CONCLUSIONS: This automated artificial intelligence model trained on a large and diverse data set shows high accuracy and generalizable performance for detection of PDA on diagnostic CTs as well as for visually occult PDA on prediagnostic CTs. Prospective validation with blood-based biomarkers is warranted to assess the potential for early detection of sporadic PDA in high-risk individuals.
Asunto(s)
Carcinoma Ductal Pancreático , Diabetes Mellitus , Neoplasias Pancreáticas , Humanos , Inteligencia Artificial , Estudios de Casos y Controles , Detección Precoz del Cáncer , Neoplasias Pancreáticas/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Carcinoma Ductal Pancreático/diagnóstico por imagen , Estudios RetrospectivosRESUMEN
SIGNIFICANCE STATEMENT: Segmentation of multiple structures in cross-sectional imaging is time-consuming and impractical to perform manually, especially if the end goal is clinical implementation. In this study, we developed, validated, and demonstrated the capability of a deep learning algorithm to segment individual medullary pyramids in a rapid, accurate, and reproducible manner. The results demonstrate that cortex volume, medullary volume, number of pyramids, and mean pyramid volume is associated with patient clinical characteristics and microstructural findings and provide insights into the mechanisms that may lead to CKD. BACKGROUND: The kidney is a lobulated organ, but little is known regarding the clinical importance of the number and size of individual kidney lobes. METHODS: After applying a previously validated algorithm to segment the cortex and medulla, a deep-learning algorithm was developed and validated to segment and count individual medullary pyramids on contrast-enhanced computed tomography images of living kidney donors before donation. The association of cortex volume, medullary volume, number of pyramids, and mean pyramid volume with concurrent clinical characteristics (kidney function and CKD risk factors), kidney biopsy morphology (nephron number, glomerular volume, and nephrosclerosis), and short- and long-term GFR <60 or <45 ml/min per 1.73 m 2 was assessed. RESULTS: Among 2876 living kidney donors, 1132 had short-term follow-up at a median of 3.8 months and 638 had long-term follow-up at a median of 10.0 years. Larger cortex volume was associated with younger age, male sex, larger body size, higher GFR, albuminuria, more nephrons, larger glomeruli, less nephrosclerosis, and lower risk of low GFR at follow-up. Larger pyramids were associated with older age, female sex, larger body size, higher GFR, more nephrons, larger glomerular volume, more nephrosclerosis, and higher risk of low GFR at follow-up. More pyramids were associated with younger age, male sex, greater height, no hypertension, higher GFR, lower uric acid, more nephrons, less nephrosclerosis, and a lower risk of low GFR at follow-up. CONCLUSIONS: Cortex volume and medullary pyramid volume and count reflect underlying variation in nephron number and nephron size as well as merging of pyramids because of age-related nephrosclerosis, with loss of detectable cortical columns separating pyramids.
Asunto(s)
Trasplante de Riñón , Riñón , Nefroesclerosis , Insuficiencia Renal Crónica , Femenino , Humanos , Masculino , Biopsia , Tasa de Filtración Glomerular , Riñón/patología , Nefroesclerosis/patología , Insuficiencia Renal Crónica/cirugíaRESUMEN
OBJECTIVES: To develop a bounding-box-based 3D convolutional neural network (CNN) for user-guided volumetric pancreas ductal adenocarcinoma (PDA) segmentation. METHODS: Reference segmentations were obtained on CTs (2006-2020) of treatment-naïve PDA. Images were algorithmically cropped using a tumor-centered bounding box for training a 3D nnUNet-based-CNN. Three radiologists independently segmented tumors on test subset, which were combined with reference segmentations using STAPLE to derive composite segmentations. Generalizability was evaluated on Cancer Imaging Archive (TCIA) (n = 41) and Medical Segmentation Decathlon (MSD) (n = 152) datasets. RESULTS: Total 1151 patients [667 males; age:65.3 ± 10.2 years; T1:34, T2:477, T3:237, T4:403; mean (range) tumor diameter:4.34 (1.1-12.6)-cm] were randomly divided between training/validation (n = 921) and test subsets (n = 230; 75% from other institutions). Model had a high DSC (mean ± SD) against reference segmentations (0.84 ± 0.06), which was comparable to its DSC against composite segmentations (0.84 ± 0.11, p = 0.52). Model-predicted versus reference tumor volumes were comparable (mean ± SD) (29.1 ± 42.2-cc versus 27.1 ± 32.9-cc, p = 0.69, CCC = 0.93). Inter-reader variability was high (mean DSC 0.69 ± 0.16), especially for smaller and isodense tumors. Conversely, model's high performance was comparable between tumor stages, volumes and densities (p > 0.05). Model was resilient to different tumor locations, status of pancreatic/biliary ducts, pancreatic atrophy, CT vendors and slice thicknesses, as well as to the epicenter and dimensions of the bounding-box (p > 0.05). Performance was generalizable on MSD (DSC:0.82 ± 0.06) and TCIA datasets (DSC:0.84 ± 0.08). CONCLUSION: A computationally efficient bounding box-based AI model developed on a large and diverse dataset shows high accuracy, generalizability, and robustness to clinically encountered variations for user-guided volumetric PDA segmentation including for small and isodense tumors. CLINICAL RELEVANCE: AI-driven bounding box-based user-guided PDA segmentation offers a discovery tool for image-based multi-omics models for applications such as risk-stratification, treatment response assessment, and prognostication, which are urgently needed to customize treatment strategies to the unique biological profile of each patient's tumor.
Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Masculino , Humanos , Persona de Mediana Edad , Anciano , Procesamiento de Imagen Asistido por Computador/métodos , Tomografía Computarizada por Rayos X/métodos , Redes Neurales de la Computación , Neoplasias Pancreáticas/diagnóstico por imagen , Carcinoma Ductal Pancreático/diagnóstico por imagen , Conductos PancreáticosRESUMEN
OBJECTIVE: To evaluate the performance of an internally developed and previously validated artificial intelligence (AI) algorithm for magnetic resonance (MR)-derived total kidney volume (TKV) in autosomal dominant polycystic kidney disease (ADPKD) when implemented in clinical practice. PATIENTS AND METHODS: The study included adult patients with ADPKD seen by a nephrologist at our institution between November 2019 and January 2021 and undergoing an MR imaging examination as part of standard clinical care. Thirty-three nephrologists ordered MR imaging, requesting AI-based TKV calculation for 170 cases in these 161 unique patients. We tracked implementation and performance of the algorithm over 1 year. A radiologist and a radiology technologist reviewed all cases (N=170) for quality and accuracy. Manual editing of algorithm output occurred at radiology or radiology technologist discretion. Performance was assessed by comparing AI-based and manually edited segmentations via measures of similarity and dissimilarity to ensure expected performance. We analyzed ADPKD severity class assignment of algorithm-derived vs manually edited TKV to assess impact. RESULTS: Clinical implementation was successful. Artificial intelligence algorithm-based segmentation showed high levels of agreement and was noninferior to interobserver variability and other methods for determining TKV. Of manually edited cases (n=84), the AI-algorithm TKV output showed a small mean volume difference of -3.3%. Agreement for disease class between AI-based and manually edited segmentation was high (five cases differed). CONCLUSION: Performance of an AI algorithm in real-life clinical practice can be preserved if there is careful development and validation and if the implementation environment closely matches the development conditions.
Asunto(s)
Riñón Poliquístico Autosómico Dominante , Adulto , Humanos , Riñón Poliquístico Autosómico Dominante/diagnóstico por imagen , Inteligencia Artificial , Riñón/diagnóstico por imagen , Imagen por Resonancia Magnética/métodos , Algoritmos , Espectroscopía de Resonancia MagnéticaRESUMEN
The aim of this study is to investigate the use of an exponential-plateau model to determine the required training dataset size that yields the maximum medical image segmentation performance. CT and MR images of patients with renal tumors acquired between 1997 and 2017 were retrospectively collected from our nephrectomy registry. Modality-based datasets of 50, 100, 150, 200, 250, and 300 images were assembled to train models with an 80-20 training-validation split evaluated against 50 randomly held out test set images. A third experiment using the KiTS21 dataset was also used to explore the effects of different model architectures. Exponential-plateau models were used to establish the relationship of dataset size to model generalizability performance. For segmenting non-neoplastic kidney regions on CT and MR imaging, our model yielded test Dice score plateaus of [Formula: see text] and [Formula: see text] with the number of training-validation images needed to reach the plateaus of 54 and 122, respectively. For segmenting CT and MR tumor regions, we modeled a test Dice score plateau of [Formula: see text] and [Formula: see text], with 125 and 389 training-validation images needed to reach the plateaus. For the KiTS21 dataset, the best Dice score plateaus for nn-UNet 2D and 3D architectures were [Formula: see text] and [Formula: see text] with number to reach performance plateau of 177 and 440. Our research validates that differing imaging modalities, target structures, and model architectures all affect the amount of training images required to reach a performance plateau. The modeling approach we developed will help future researchers determine for their experiments when additional training-validation images will likely not further improve model performance.
Asunto(s)
Procesamiento de Imagen Asistido por Computador , Neoplasias Renales , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Estudios Retrospectivos , Redes Neurales de la Computación , Imagen por Resonancia Magnética/métodos , Tomografía Computarizada por Rayos X , Neoplasias Renales/diagnóstico por imagenRESUMEN
BACKGROUND: Sarcopenia increases with age and is associated with poor survival outcomes in patients with cancer. By using a deep learning-based segmentation approach, clinical computed tomography (CT) images of the abdomen of patients with newly diagnosed multiple myeloma (NDMM) were reviewed to determine whether the presence of sarcopenia had any prognostic value. METHODS: Sarcopenia was detected by accurate segmentation and measurement of the skeletal muscle components present at the level of the L3 vertebrae. These skeletal muscle measurements were further normalized by the height of the patient to obtain the skeletal muscle index for each patient to classify them as sarcopenic or not. RESULTS: The study cohort consisted of 322 patients of which 67 (28%) were categorized as having high risk (HR) fluorescence in situ hybridization (FISH) cytogenetics. A total of 171 (53%) patients were sarcopenic based on their peri-diagnosis standard-dose CT scan. The median overall survival (OS) and 2-year mortality rate for sarcopenic patients was 44 months and 40% compared to 90 months and 18% for those not sarcopenic, respectively (p < .0001 for both comparisons). In a multivariable model, the adverse prognostic impact of sarcopenia was independent of International Staging System stage, age, and HR FISH cytogenetics. CONCLUSIONS: Sarcopenia identified by a machine learning-based convolutional neural network algorithm significantly affects OS in patients with NDMM. Future studies using this machine learning-based methodology of assessing sarcopenia in larger prospective clinical trials are required to validate these findings.
Asunto(s)
Aprendizaje Profundo , Mieloma Múltiple , Sarcopenia , Humanos , Sarcopenia/complicaciones , Sarcopenia/diagnóstico por imagen , Mieloma Múltiple/complicaciones , Mieloma Múltiple/diagnóstico por imagen , Mieloma Múltiple/patología , Estudios Prospectivos , Hibridación Fluorescente in Situ , Estudios Retrospectivos , Tomografía Computarizada por Rayos X/métodos , Músculo Esquelético/diagnóstico por imagen , PronósticoRESUMEN
PURPOSE: To determine if pancreas radiomics-based AI model can detect the CT imaging signature of type 2 diabetes (T2D). METHODS: Total 107 radiomic features were extracted from volumetrically segmented normal pancreas in 422 T2D patients and 456 age-matched controls. Dataset was randomly split into training (300 T2D, 300 control CTs) and test subsets (122 T2D, 156 control CTs). An XGBoost model trained on 10 features selected through top-K-based selection method and optimized through threefold cross-validation on training subset was evaluated on test subset. RESULTS: Model correctly classified 73 (60%) T2D patients and 96 (62%) controls yielding F1-score, sensitivity, specificity, precision, and AUC of 0.57, 0.62, 0.61, 0.55, and 0.65, respectively. Model's performance was equivalent across gender, CT slice thicknesses, and CT vendors (p values > 0.05). There was no difference between correctly classified versus misclassified patients in the mean (range) T2D duration [4.5 (0-15.4) versus 4.8 (0-15.7) years, p = 0.8], antidiabetic treatment [insulin (22% versus 18%), oral antidiabetics (10% versus 18%), both (41% versus 39%) (p > 0.05)], and treatment duration [5.4 (0-15) versus 5 (0-13) years, p = 0.4]. CONCLUSION: Pancreas radiomics-based AI model can detect the imaging signature of T2D. Further refinement and validation are needed to evaluate its potential for opportunistic T2D detection on millions of CTs that are performed annually.
Asunto(s)
Diabetes Mellitus Tipo 2 , Insulinas , Abdomen , Diabetes Mellitus Tipo 2/diagnóstico por imagen , Humanos , Hipoglucemiantes , Aprendizaje Automático , Estudios Retrospectivos , Tomografía Computarizada por Rayos X/métodosRESUMEN
PURPOSE: This study aimed to compare accuracy and efficiency of a convolutional neural network (CNN)-enhanced workflow for pancreas segmentation versus radiologists in the context of interreader reliability. METHODS: Volumetric pancreas segmentations on a data set of 294 portal venous computed tomographies were performed by 3 radiologists (R1, R2, and R3) and by a CNN. Convolutional neural network segmentations were reviewed and, if needed, corrected ("corrected CNN [c-CNN]" segmentations) by radiologists. Ground truth was obtained from radiologists' manual segmentations using simultaneous truth and performance level estimation algorithm. Interreader reliability and model's accuracy were evaluated with Dice-Sorenson coefficient (DSC) and Jaccard coefficient (JC). Equivalence was determined using a two 1-sided test. Convolutional neural network segmentations below the 25th percentile DSC were reviewed to evaluate segmentation errors. Time for manual segmentation and c-CNN was compared. RESULTS: Pancreas volumes from 3 sets of segmentations (manual, CNN, and c-CNN) were noninferior to simultaneous truth and performance level estimation-derived volumes [76.6 cm 3 (20.2 cm 3 ), P < 0.05]. Interreader reliability was high (mean [SD] DSC between R2-R1, 0.87 [0.04]; R3-R1, 0.90 [0.05]; R2-R3, 0.87 [0.04]). Convolutional neural network segmentations were highly accurate (DSC, 0.88 [0.05]; JC, 0.79 [0.07]) and required minimal-to-no corrections (c-CNN: DSC, 0.89 [0.04]; JC, 0.81 [0.06]; equivalence, P < 0.05). Undersegmentation (n = 47 [64%]) was common in the 73 CNN segmentations below 25th percentile DSC, but there were no major errors. Total inference time (minutes) for CNN was 1.2 (0.3). Average time (minutes) taken by radiologists for c-CNN (0.6 [0.97]) was substantially lower compared with manual segmentation (3.37 [1.47]; savings of 77.9%-87% [ P < 0.0001]). CONCLUSIONS: Convolutional neural network-enhanced workflow provides high accuracy and efficiency for volumetric pancreas segmentation on computed tomography.
Asunto(s)
Páncreas , Radiólogos , Humanos , Reproducibilidad de los Resultados , Páncreas/diagnóstico por imagen , Redes Neurales de la Computación , Tomografía Computarizada por Rayos XRESUMEN
OBJECTIVE: Machine learning, deep learning, and artificial intelligence (AI) are terms that have made their way into nearly all areas of medicine. In the case of medical imaging, these methods have become the state of the art in nearly all areas from image reconstruction to image processing and automated analysis. In contrast to other areas, such as brain and breast imaging, the impacts of AI have not been as strongly felt in gynecologic imaging. In this review article, we: (i) provide a background of clinically relevant AI concepts, (ii) describe methods and approaches in computer vision, and (iii) highlight prior work related to image classification tasks utilizing AI approaches in gynecologic imaging. DATA SOURCES: A comprehensive search of several databases from each database's inception to March 18th, 2021, English language, was conducted. The databases included Ovid MEDLINE(R) and Epub Ahead of Print, In-Process & Other Non-Indexed Citations, and Daily, Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, and Ovid Cochrane Database of Systematic Reviews and ClinicalTrials.gov. METHODS OF STUDY SELECTION: We performed an extensive literature review with 61 articles curated by three reviewers and subsequent sorting by specialists using specific inclusion and exclusion criteria. TABULATION, INTEGRATION, AND RESULTS: We summarize the literature grouped by each of the three most common gynecologic malignancies: endometrial, cervical, and ovarian. For each, a brief introduction encapsulating the AI methods, imaging modalities, and clinical parameters in the selected articles is presented. We conclude with a discussion of current developments, trends and limitations, and suggest directions for future study. CONCLUSION: This review article should prove useful for collaborative teams performing research studies targeted at the incorporation of radiological imaging and AI methods into gynecological clinical practice.
Asunto(s)
Inteligencia Artificial , Procesamiento de Imagen Asistido por Computador , Diagnóstico por Imagen , Femenino , HumanosRESUMEN
BACKGROUND & AIMS: Our purpose was to detect pancreatic ductal adenocarcinoma (PDAC) at the prediagnostic stage (3-36 months before clinical diagnosis) using radiomics-based machine-learning (ML) models, and to compare performance against radiologists in a case-control study. METHODS: Volumetric pancreas segmentation was performed on prediagnostic computed tomography scans (CTs) (median interval between CT and PDAC diagnosis: 398 days) of 155 patients and an age-matched cohort of 265 subjects with normal pancreas. A total of 88 first-order and gray-level radiomic features were extracted and 34 features were selected through the least absolute shrinkage and selection operator-based feature selection method. The dataset was randomly divided into training (292 CTs: 110 prediagnostic and 182 controls) and test subsets (128 CTs: 45 prediagnostic and 83 controls). Four ML classifiers, k-nearest neighbor (KNN), support vector machine (SVM), random forest (RM), and extreme gradient boosting (XGBoost), were evaluated. Specificity of model with highest accuracy was further validated on an independent internal dataset (n = 176) and the public National Institutes of Health dataset (n = 80). Two radiologists (R4 and R5) independently evaluated the pancreas on a 5-point diagnostic scale. RESULTS: Median (range) time between prediagnostic CTs of the test subset and PDAC diagnosis was 386 (97-1092) days. SVM had the highest sensitivity (mean; 95% confidence interval) (95.5; 85.5-100.0), specificity (90.3; 84.3-91.5), F1-score (89.5; 82.3-91.7), area under the curve (AUC) (0.98; 0.94-0.98), and accuracy (92.2%; 86.7-93.7) for classification of CTs into prediagnostic versus normal. All 3 other ML models, KNN, RF, and XGBoost, had comparable AUCs (0.95, 0.95, and 0.96, respectively). The high specificity of SVM was generalizable to both the independent internal (92.6%) and the National Institutes of Health dataset (96.2%). In contrast, interreader radiologist agreement was only fair (Cohen's kappa 0.3) and their mean AUC (0.66; 0.46-0.86) was lower than each of the 4 ML models (AUCs: 0.95-0.98) (P < .001). Radiologists also recorded false positive indirect findings of PDAC in control subjects (n = 83) (7% R4, 18% R5). CONCLUSIONS: Radiomics-based ML models can detect PDAC from normal pancreas when it is beyond human interrogation capability at a substantial lead time before clinical diagnosis. Prospective validation and integration of such models with complementary fluid-based biomarkers has the potential for PDAC detection at a stage when surgical cure is a possibility.
Asunto(s)
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Estudios de Casos y Controles , Neoplasias Pancreáticas/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Carcinoma Ductal Pancreático/diagnóstico por imagen , Aprendizaje Automático , Estudios Retrospectivos , Neoplasias PancreáticasRESUMEN
BACKGROUND: In kidney transplantation, a contrast CT scan is obtained in the donor candidate to detect subclinical pathology in the kidney. Recent work from the Aging Kidney Anatomy study has characterized kidney, cortex, and medulla volumes using a manual image-processing tool. However, this technique is time consuming and impractical for clinical care, and thus, these measurements are not obtained during donor evaluations. This study proposes a fully automated segmentation approach for measuring kidney, cortex, and medulla volumes. METHODS: A total of 1930 contrast-enhanced CT exams with reference standard manual segmentations from one institution were used to develop the algorithm. A convolutional neural network model was trained (n=1238) and validated (n=306), and then evaluated in a hold-out test set of reference standard segmentations (n=386). After the initial evaluation, the algorithm was further tested on datasets originating from two external sites (n=1226). RESULTS: The automated model was found to perform on par with manual segmentation, with errors similar to interobserver variability with manual segmentation. Compared with the reference standard, the automated approach achieved a Dice similarity metric of 0.94 (right cortex), 0.90 (right medulla), 0.94 (left cortex), and 0.90 (left medulla) in the test set. Similar performance was observed when the algorithm was applied on the two external datasets. CONCLUSIONS: A fully automated approach for measuring cortex and medullary volumes in CT images of the kidneys has been established. This method may prove useful for a wide range of clinical applications.