Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Gastroenterology ; 165(6): 1533-1546.e4, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37657758

ABSTRACT

BACKGROUND & AIMS: The aims of our case-control study were (1) to develop an automated 3-dimensional (3D) Convolutional Neural Network (CNN) for detection of pancreatic ductal adenocarcinoma (PDA) on diagnostic computed tomography scans (CTs), (2) evaluate its generalizability on multi-institutional public data sets, (3) its utility as a potential screening tool using a simulated cohort with high pretest probability, and (4) its ability to detect visually occult preinvasive cancer on prediagnostic CTs. METHODS: A 3D-CNN classification system was trained using algorithmically generated bounding boxes and pancreatic masks on a curated data set of 696 portal phase diagnostic CTs with PDA and 1080 control images with a nonneoplastic pancreas. The model was evaluated on (1) an intramural hold-out test subset (409 CTs with PDA, 829 controls); (2) a simulated cohort with a case-control distribution that matched the risk of PDA in glycemically defined new-onset diabetes, and Enriching New-Onset Diabetes for Pancreatic Cancer score ≥3; (3) multi-institutional public data sets (194 CTs with PDA, 80 controls), and (4) a cohort of 100 prediagnostic CTs (i.e., CTs incidentally acquired 3-36 months before clinical diagnosis of PDA) without a focal mass, and 134 controls. RESULTS: Of the CTs in the intramural test subset, 798 (64%) were from other hospitals. The model correctly classified 360 CTs (88%) with PDA and 783 control CTs (94%), with a mean accuracy 0.92 (95% CI, 0.91-0.94), area under the receiver operating characteristic (AUROC) curve of 0.97 (95% CI, 0.96-0.98), sensitivity of 0.88 (95% CI, 0.85-0.91), and specificity of 0.95 (95% CI, 0.93-0.96). Activation areas on heat maps overlapped with the tumor in 350 of 360 CTs (97%). Performance was high across tumor stages (sensitivity of 0.80, 0.87, 0.95, and 1.0 on T1 through T4 stages, respectively), comparable for hypodense vs isodense tumors (sensitivity: 0.90 vs 0.82), different age, sex, CT slice thicknesses, and vendors (all P > .05), and generalizable on both the simulated cohort (accuracy, 0.95 [95% 0.94-0.95]; AUROC curve, 0.97 [95% CI, 0.94-0.99]) and public data sets (accuracy, 0.86 [95% CI, 0.82-0.90]; AUROC curve, 0.90 [95% CI, 0.86-0.95]). Despite being exclusively trained on diagnostic CTs with larger tumors, the model could detect occult PDA on prediagnostic CTs (accuracy, 0.84 [95% CI, 0.79-0.88]; AUROC curve, 0.91 [95% CI, 0.86-0.94]; sensitivity, 0.75 [95% CI, 0.67-0.84]; and specificity, 0.90 [95% CI, 0.85-0.95]) at a median 475 days (range, 93-1082 days) before clinical diagnosis. CONCLUSIONS: This automated artificial intelligence model trained on a large and diverse data set shows high accuracy and generalizable performance for detection of PDA on diagnostic CTs as well as for visually occult PDA on prediagnostic CTs. Prospective validation with blood-based biomarkers is warranted to assess the potential for early detection of sporadic PDA in high-risk individuals.


Subject(s)
Carcinoma, Pancreatic Ductal , Diabetes Mellitus , Pancreatic Neoplasms , Humans , Artificial Intelligence , Case-Control Studies , Early Detection of Cancer , Pancreatic Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Carcinoma, Pancreatic Ductal/diagnostic imaging , Retrospective Studies
2.
Eur Radiol ; 2024 Jun 06.
Article in English | MEDLINE | ID: mdl-38842692

ABSTRACT

OBJECTIVES: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes. MATERIALS AND METHODS: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists. RESULTS: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively. CONCLUSION: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes. CLINICAL RELEVANCE STATEMENT: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI. KEY POINTS: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.

3.
J Am Soc Nephrol ; 34(10): 1752-1763, 2023 10 01.
Article in English | MEDLINE | ID: mdl-37562061

ABSTRACT

SIGNIFICANCE STATEMENT: Segmentation of multiple structures in cross-sectional imaging is time-consuming and impractical to perform manually, especially if the end goal is clinical implementation. In this study, we developed, validated, and demonstrated the capability of a deep learning algorithm to segment individual medullary pyramids in a rapid, accurate, and reproducible manner. The results demonstrate that cortex volume, medullary volume, number of pyramids, and mean pyramid volume is associated with patient clinical characteristics and microstructural findings and provide insights into the mechanisms that may lead to CKD. BACKGROUND: The kidney is a lobulated organ, but little is known regarding the clinical importance of the number and size of individual kidney lobes. METHODS: After applying a previously validated algorithm to segment the cortex and medulla, a deep-learning algorithm was developed and validated to segment and count individual medullary pyramids on contrast-enhanced computed tomography images of living kidney donors before donation. The association of cortex volume, medullary volume, number of pyramids, and mean pyramid volume with concurrent clinical characteristics (kidney function and CKD risk factors), kidney biopsy morphology (nephron number, glomerular volume, and nephrosclerosis), and short- and long-term GFR <60 or <45 ml/min per 1.73 m 2 was assessed. RESULTS: Among 2876 living kidney donors, 1132 had short-term follow-up at a median of 3.8 months and 638 had long-term follow-up at a median of 10.0 years. Larger cortex volume was associated with younger age, male sex, larger body size, higher GFR, albuminuria, more nephrons, larger glomeruli, less nephrosclerosis, and lower risk of low GFR at follow-up. Larger pyramids were associated with older age, female sex, larger body size, higher GFR, more nephrons, larger glomerular volume, more nephrosclerosis, and higher risk of low GFR at follow-up. More pyramids were associated with younger age, male sex, greater height, no hypertension, higher GFR, lower uric acid, more nephrons, less nephrosclerosis, and a lower risk of low GFR at follow-up. CONCLUSIONS: Cortex volume and medullary pyramid volume and count reflect underlying variation in nephron number and nephron size as well as merging of pyramids because of age-related nephrosclerosis, with loss of detectable cortical columns separating pyramids.


Subject(s)
Kidney Transplantation , Kidney , Nephrosclerosis , Renal Insufficiency, Chronic , Female , Humans , Male , Biopsy , Glomerular Filtration Rate , Kidney/pathology , Nephrosclerosis/pathology , Renal Insufficiency, Chronic/surgery
4.
Cancer ; 129(3): 385-392, 2023 02 01.
Article in English | MEDLINE | ID: mdl-36413412

ABSTRACT

BACKGROUND: Sarcopenia increases with age and is associated with poor survival outcomes in patients with cancer. By using a deep learning-based segmentation approach, clinical computed tomography (CT) images of the abdomen of patients with newly diagnosed multiple myeloma (NDMM) were reviewed to determine whether the presence of sarcopenia had any prognostic value. METHODS: Sarcopenia was detected by accurate segmentation and measurement of the skeletal muscle components present at the level of the L3 vertebrae. These skeletal muscle measurements were further normalized by the height of the patient to obtain the skeletal muscle index for each patient to classify them as sarcopenic or not. RESULTS: The study cohort consisted of 322 patients of which 67 (28%) were categorized as having high risk (HR) fluorescence in situ hybridization (FISH) cytogenetics. A total of 171 (53%) patients were sarcopenic based on their peri-diagnosis standard-dose CT scan. The median overall survival (OS) and 2-year mortality rate for sarcopenic patients was 44 months and 40% compared to 90 months and 18% for those not sarcopenic, respectively (p < .0001 for both comparisons). In a multivariable model, the adverse prognostic impact of sarcopenia was independent of International Staging System stage, age, and HR FISH cytogenetics. CONCLUSIONS: Sarcopenia identified by a machine learning-based convolutional neural network algorithm significantly affects OS in patients with NDMM. Future studies using this machine learning-based methodology of assessing sarcopenia in larger prospective clinical trials are required to validate these findings.


Subject(s)
Deep Learning , Multiple Myeloma , Sarcopenia , Humans , Sarcopenia/complications , Sarcopenia/diagnostic imaging , Multiple Myeloma/complications , Multiple Myeloma/diagnostic imaging , Multiple Myeloma/pathology , Prospective Studies , In Situ Hybridization, Fluorescence , Retrospective Studies , Tomography, X-Ray Computed/methods , Muscle, Skeletal/diagnostic imaging , Prognosis
5.
Gastroenterology ; 163(5): 1435-1446.e3, 2022 11.
Article in English | MEDLINE | ID: mdl-35788343

ABSTRACT

BACKGROUND & AIMS: Our purpose was to detect pancreatic ductal adenocarcinoma (PDAC) at the prediagnostic stage (3-36 months before clinical diagnosis) using radiomics-based machine-learning (ML) models, and to compare performance against radiologists in a case-control study. METHODS: Volumetric pancreas segmentation was performed on prediagnostic computed tomography scans (CTs) (median interval between CT and PDAC diagnosis: 398 days) of 155 patients and an age-matched cohort of 265 subjects with normal pancreas. A total of 88 first-order and gray-level radiomic features were extracted and 34 features were selected through the least absolute shrinkage and selection operator-based feature selection method. The dataset was randomly divided into training (292 CTs: 110 prediagnostic and 182 controls) and test subsets (128 CTs: 45 prediagnostic and 83 controls). Four ML classifiers, k-nearest neighbor (KNN), support vector machine (SVM), random forest (RM), and extreme gradient boosting (XGBoost), were evaluated. Specificity of model with highest accuracy was further validated on an independent internal dataset (n = 176) and the public National Institutes of Health dataset (n = 80). Two radiologists (R4 and R5) independently evaluated the pancreas on a 5-point diagnostic scale. RESULTS: Median (range) time between prediagnostic CTs of the test subset and PDAC diagnosis was 386 (97-1092) days. SVM had the highest sensitivity (mean; 95% confidence interval) (95.5; 85.5-100.0), specificity (90.3; 84.3-91.5), F1-score (89.5; 82.3-91.7), area under the curve (AUC) (0.98; 0.94-0.98), and accuracy (92.2%; 86.7-93.7) for classification of CTs into prediagnostic versus normal. All 3 other ML models, KNN, RF, and XGBoost, had comparable AUCs (0.95, 0.95, and 0.96, respectively). The high specificity of SVM was generalizable to both the independent internal (92.6%) and the National Institutes of Health dataset (96.2%). In contrast, interreader radiologist agreement was only fair (Cohen's kappa 0.3) and their mean AUC (0.66; 0.46-0.86) was lower than each of the 4 ML models (AUCs: 0.95-0.98) (P < .001). Radiologists also recorded false positive indirect findings of PDAC in control subjects (n = 83) (7% R4, 18% R5). CONCLUSIONS: Radiomics-based ML models can detect PDAC from normal pancreas when it is beyond human interrogation capability at a substantial lead time before clinical diagnosis. Prospective validation and integration of such models with complementary fluid-based biomarkers has the potential for PDAC detection at a stage when surgical cure is a possibility.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Case-Control Studies , Pancreatic Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Carcinoma, Pancreatic Ductal/diagnostic imaging , Machine Learning , Retrospective Studies , Pancreatic Neoplasms
6.
Pancreatology ; 23(5): 522-529, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37296006

ABSTRACT

OBJECTIVES: To develop a bounding-box-based 3D convolutional neural network (CNN) for user-guided volumetric pancreas ductal adenocarcinoma (PDA) segmentation. METHODS: Reference segmentations were obtained on CTs (2006-2020) of treatment-naïve PDA. Images were algorithmically cropped using a tumor-centered bounding box for training a 3D nnUNet-based-CNN. Three radiologists independently segmented tumors on test subset, which were combined with reference segmentations using STAPLE to derive composite segmentations. Generalizability was evaluated on Cancer Imaging Archive (TCIA) (n = 41) and Medical Segmentation Decathlon (MSD) (n = 152) datasets. RESULTS: Total 1151 patients [667 males; age:65.3 ± 10.2 years; T1:34, T2:477, T3:237, T4:403; mean (range) tumor diameter:4.34 (1.1-12.6)-cm] were randomly divided between training/validation (n = 921) and test subsets (n = 230; 75% from other institutions). Model had a high DSC (mean ± SD) against reference segmentations (0.84 ± 0.06), which was comparable to its DSC against composite segmentations (0.84 ± 0.11, p = 0.52). Model-predicted versus reference tumor volumes were comparable (mean ± SD) (29.1 ± 42.2-cc versus 27.1 ± 32.9-cc, p = 0.69, CCC = 0.93). Inter-reader variability was high (mean DSC 0.69 ± 0.16), especially for smaller and isodense tumors. Conversely, model's high performance was comparable between tumor stages, volumes and densities (p > 0.05). Model was resilient to different tumor locations, status of pancreatic/biliary ducts, pancreatic atrophy, CT vendors and slice thicknesses, as well as to the epicenter and dimensions of the bounding-box (p > 0.05). Performance was generalizable on MSD (DSC:0.82 ± 0.06) and TCIA datasets (DSC:0.84 ± 0.08). CONCLUSION: A computationally efficient bounding box-based AI model developed on a large and diverse dataset shows high accuracy, generalizability, and robustness to clinically encountered variations for user-guided volumetric PDA segmentation including for small and isodense tumors. CLINICAL RELEVANCE: AI-driven bounding box-based user-guided PDA segmentation offers a discovery tool for image-based multi-omics models for applications such as risk-stratification, treatment response assessment, and prognostication, which are urgently needed to customize treatment strategies to the unique biological profile of each patient's tumor.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Male , Humans , Middle Aged , Aged , Image Processing, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Neural Networks, Computer , Pancreatic Neoplasms/diagnostic imaging , Carcinoma, Pancreatic Ductal/diagnostic imaging , Pancreatic Ducts
7.
J Am Soc Nephrol ; 33(2): 420-430, 2022 02.
Article in English | MEDLINE | ID: mdl-34876489

ABSTRACT

BACKGROUND: In kidney transplantation, a contrast CT scan is obtained in the donor candidate to detect subclinical pathology in the kidney. Recent work from the Aging Kidney Anatomy study has characterized kidney, cortex, and medulla volumes using a manual image-processing tool. However, this technique is time consuming and impractical for clinical care, and thus, these measurements are not obtained during donor evaluations. This study proposes a fully automated segmentation approach for measuring kidney, cortex, and medulla volumes. METHODS: A total of 1930 contrast-enhanced CT exams with reference standard manual segmentations from one institution were used to develop the algorithm. A convolutional neural network model was trained (n=1238) and validated (n=306), and then evaluated in a hold-out test set of reference standard segmentations (n=386). After the initial evaluation, the algorithm was further tested on datasets originating from two external sites (n=1226). RESULTS: The automated model was found to perform on par with manual segmentation, with errors similar to interobserver variability with manual segmentation. Compared with the reference standard, the automated approach achieved a Dice similarity metric of 0.94 (right cortex), 0.90 (right medulla), 0.94 (left cortex), and 0.90 (left medulla) in the test set. Similar performance was observed when the algorithm was applied on the two external datasets. CONCLUSIONS: A fully automated approach for measuring cortex and medullary volumes in CT images of the kidneys has been established. This method may prove useful for a wide range of clinical applications.


Subject(s)
Algorithms , Image Processing, Computer-Assisted/methods , Kidney Cortex/diagnostic imaging , Kidney Medulla/diagnostic imaging , Tomography, X-Ray Computed/methods , Adult , Contrast Media , Deep Learning , Donor Selection/methods , Donor Selection/statistics & numerical data , Female , Humans , Image Processing, Computer-Assisted/statistics & numerical data , Kidney Transplantation , Living Donors , Male , Middle Aged , Neural Networks, Computer , Observer Variation , Tomography, X-Ray Computed/statistics & numerical data
8.
J Digit Imaging ; 36(4): 1770-1781, 2023 08.
Article in English | MEDLINE | ID: mdl-36932251

ABSTRACT

The aim of this study is to investigate the use of an exponential-plateau model to determine the required training dataset size that yields the maximum medical image segmentation performance. CT and MR images of patients with renal tumors acquired between 1997 and 2017 were retrospectively collected from our nephrectomy registry. Modality-based datasets of 50, 100, 150, 200, 250, and 300 images were assembled to train models with an 80-20 training-validation split evaluated against 50 randomly held out test set images. A third experiment using the KiTS21 dataset was also used to explore the effects of different model architectures. Exponential-plateau models were used to establish the relationship of dataset size to model generalizability performance. For segmenting non-neoplastic kidney regions on CT and MR imaging, our model yielded test Dice score plateaus of [Formula: see text] and [Formula: see text] with the number of training-validation images needed to reach the plateaus of 54 and 122, respectively. For segmenting CT and MR tumor regions, we modeled a test Dice score plateau of [Formula: see text] and [Formula: see text], with 125 and 389 training-validation images needed to reach the plateaus. For the KiTS21 dataset, the best Dice score plateaus for nn-UNet 2D and 3D architectures were [Formula: see text] and [Formula: see text] with number to reach performance plateau of 177 and 440. Our research validates that differing imaging modalities, target structures, and model architectures all affect the amount of training images required to reach a performance plateau. The modeling approach we developed will help future researchers determine for their experiments when additional training-validation images will likely not further improve model performance.


Subject(s)
Image Processing, Computer-Assisted , Kidney Neoplasms , Humans , Image Processing, Computer-Assisted/methods , Retrospective Studies , Neural Networks, Computer , Magnetic Resonance Imaging/methods , Tomography, X-Ray Computed , Kidney Neoplasms/diagnostic imaging
9.
Gynecol Oncol ; 166(3): 596-605, 2022 09.
Article in English | MEDLINE | ID: mdl-35914978

ABSTRACT

OBJECTIVE: Machine learning, deep learning, and artificial intelligence (AI) are terms that have made their way into nearly all areas of medicine. In the case of medical imaging, these methods have become the state of the art in nearly all areas from image reconstruction to image processing and automated analysis. In contrast to other areas, such as brain and breast imaging, the impacts of AI have not been as strongly felt in gynecologic imaging. In this review article, we: (i) provide a background of clinically relevant AI concepts, (ii) describe methods and approaches in computer vision, and (iii) highlight prior work related to image classification tasks utilizing AI approaches in gynecologic imaging. DATA SOURCES: A comprehensive search of several databases from each database's inception to March 18th, 2021, English language, was conducted. The databases included Ovid MEDLINE(R) and Epub Ahead of Print, In-Process & Other Non-Indexed Citations, and Daily, Ovid EMBASE, Ovid Cochrane Central Register of Controlled Trials, and Ovid Cochrane Database of Systematic Reviews and ClinicalTrials.gov. METHODS OF STUDY SELECTION: We performed an extensive literature review with 61 articles curated by three reviewers and subsequent sorting by specialists using specific inclusion and exclusion criteria. TABULATION, INTEGRATION, AND RESULTS: We summarize the literature grouped by each of the three most common gynecologic malignancies: endometrial, cervical, and ovarian. For each, a brief introduction encapsulating the AI methods, imaging modalities, and clinical parameters in the selected articles is presented. We conclude with a discussion of current developments, trends and limitations, and suggest directions for future study. CONCLUSION: This review article should prove useful for collaborative teams performing research studies targeted at the incorporation of radiological imaging and AI methods into gynecological clinical practice.


Subject(s)
Artificial Intelligence , Image Processing, Computer-Assisted , Diagnostic Imaging , Female , Humans
10.
J Comput Assist Tomogr ; 46(6): 841-847, 2022.
Article in English | MEDLINE | ID: mdl-36055122

ABSTRACT

PURPOSE: This study aimed to compare accuracy and efficiency of a convolutional neural network (CNN)-enhanced workflow for pancreas segmentation versus radiologists in the context of interreader reliability. METHODS: Volumetric pancreas segmentations on a data set of 294 portal venous computed tomographies were performed by 3 radiologists (R1, R2, and R3) and by a CNN. Convolutional neural network segmentations were reviewed and, if needed, corrected ("corrected CNN [c-CNN]" segmentations) by radiologists. Ground truth was obtained from radiologists' manual segmentations using simultaneous truth and performance level estimation algorithm. Interreader reliability and model's accuracy were evaluated with Dice-Sorenson coefficient (DSC) and Jaccard coefficient (JC). Equivalence was determined using a two 1-sided test. Convolutional neural network segmentations below the 25th percentile DSC were reviewed to evaluate segmentation errors. Time for manual segmentation and c-CNN was compared. RESULTS: Pancreas volumes from 3 sets of segmentations (manual, CNN, and c-CNN) were noninferior to simultaneous truth and performance level estimation-derived volumes [76.6 cm 3 (20.2 cm 3 ), P < 0.05]. Interreader reliability was high (mean [SD] DSC between R2-R1, 0.87 [0.04]; R3-R1, 0.90 [0.05]; R2-R3, 0.87 [0.04]). Convolutional neural network segmentations were highly accurate (DSC, 0.88 [0.05]; JC, 0.79 [0.07]) and required minimal-to-no corrections (c-CNN: DSC, 0.89 [0.04]; JC, 0.81 [0.06]; equivalence, P < 0.05). Undersegmentation (n = 47 [64%]) was common in the 73 CNN segmentations below 25th percentile DSC, but there were no major errors. Total inference time (minutes) for CNN was 1.2 (0.3). Average time (minutes) taken by radiologists for c-CNN (0.6 [0.97]) was substantially lower compared with manual segmentation (3.37 [1.47]; savings of 77.9%-87% [ P < 0.0001]). CONCLUSIONS: Convolutional neural network-enhanced workflow provides high accuracy and efficiency for volumetric pancreas segmentation on computed tomography.


Subject(s)
Pancreas , Radiologists , Humans , Reproducibility of Results , Pancreas/diagnostic imaging , Neural Networks, Computer , Tomography, X-Ray Computed
11.
Pancreatology ; 21(5): 1001-1008, 2021 Aug.
Article in English | MEDLINE | ID: mdl-33840636

ABSTRACT

OBJECTIVE: Quality gaps in medical imaging datasets lead to profound errors in experiments. Our objective was to characterize such quality gaps in public pancreas imaging datasets (PPIDs), to evaluate their impact on previously published studies, and to provide post-hoc labels and segmentations as a value-add for these PPIDs. METHODS: We scored the available PPIDs on the medical imaging data readiness (MIDaR) scale, and evaluated for associated metadata, image quality, acquisition phase, etiology of pancreas lesion, sources of confounders, and biases. Studies utilizing these PPIDs were evaluated for awareness of and any impact of quality gaps on their results. Volumetric pancreatic adenocarcinoma (PDA) segmentations were performed for non-annotated CTs by a junior radiologist (R1) and reviewed by a senior radiologist (R3). RESULTS: We found three PPIDs with 560 CTs and six MRIs. NIH dataset of normal pancreas CTs (PCT) (n = 80 CTs) had optimal image quality and met MIDaR A criteria but parts of pancreas have been excluded in the provided segmentations. TCIA-PDA (n = 60 CTs; 6 MRIs) and MSD(n = 420 CTs) datasets categorized to MIDaR B due to incomplete annotations, limited metadata, and insufficient documentation. Substantial proportion of CTs from TCIA-PDA and MSD datasets were found unsuitable for AI due to biliary stents [TCIA-PDA:10 (17%); MSD:112 (27%)] or other factors (non-portal venous phase, suboptimal image quality, non-PDA etiology, or post-treatment status) [TCIA-PDA:5 (8.5%); MSD:156 (37.1%)]. These quality gaps were not accounted for in any of the 25 studies that have used these PPIDs (NIH-PCT:20; MSD:1; both: 4). PDA segmentations were done by R1 in 91 eligible CTs (TCIA-PDA:42; MSD:49). Of these, corrections were made by R3 in 16 CTs (18%) (TCIA-PDA:4; MSD:12) [mean (standard deviation) Dice: 0.72(0.21) and 0.63(0.23) respectively]. CONCLUSION: Substantial quality gaps, sources of bias, and high proportion of CTs unsuitable for AI characterize the available limited PPIDs. Published studies on these PPIDs do not account for these quality gaps. We complement these PPIDs through post-hoc labels and segmentations for public release on the TCIA portal. Collaborative efforts leading to large, well-curated PPIDs supported by adequate documentation are critically needed to translate the promise of AI to clinical practice.


Subject(s)
Adenocarcinoma , Artificial Intelligence , Pancreatic Neoplasms , Humans , Magnetic Resonance Imaging , Pancreas/diagnostic imaging , Pancreatic Neoplasms/diagnostic imaging
12.
J Digit Imaging ; 34(5): 1183-1189, 2021 10.
Article in English | MEDLINE | ID: mdl-34047906

ABSTRACT

Imaging-based measurements form the basis of surgical decision making in patients with aortic aneurysm. Unfortunately, manual measurement suffer from suboptimal temporal reproducibility, which can lead to delayed or unnecessary intervention. We tested the hypothesis that deep learning could improve upon the temporal reproducibility of CT angiography-derived thoracic aortic measurements in the setting of imperfect ground-truth training data. To this end, we trained a standard deep learning segmentation model from which measurements of aortic volume and diameter could be extracted. First, three blinded cardiothoracic radiologists visually confirmed non-inferiority of deep learning segmentation maps with respect to manual segmentation on a 50-patient hold-out test cohort, demonstrating a slight preference for the deep learning method (p < 1e-5). Next, reproducibility was assessed by evaluating measured change (coefficient of reproducibility and standard deviation) in volume and diameter values extracted from segmentation maps in patients for whom multiple scans were available and whose aortas had been deemed stable over time by visual assessment (n = 57 patients, 206 scans). Deep learning temporal reproducibility was superior for measures of both volume (p < 0.008) and diameter (p < 1e-5) and reproducibility metrics compared favorably with previously reported values of manual inter-rater variability. Our work motivates future efforts to apply deep learning to aortic evaluation.


Subject(s)
Deep Learning , Aorta , Humans , Reproducibility of Results
13.
Radiology ; 290(3): 669-679, 2019 03.
Article in English | MEDLINE | ID: mdl-30526356

ABSTRACT

Purpose To develop and evaluate a fully automated algorithm for segmenting the abdomen from CT to quantify body composition. Materials and Methods For this retrospective study, a convolutional neural network based on the U-Net architecture was trained to perform abdominal segmentation on a data set of 2430 two-dimensional CT examinations and was tested on 270 CT examinations. It was further tested on a separate data set of 2369 patients with hepatocellular carcinoma (HCC). CT examinations were performed between 1997 and 2015. The mean age of patients was 67 years; for male patients, it was 67 years (range, 29-94 years), and for female patients, it was 66 years (range, 31-97 years). Differences in segmentation performance were assessed by using two-way analysis of variance with Bonferroni correction. Results Compared with reference segmentation, the model for this study achieved Dice scores (mean ± standard deviation) of 0.98 ± 0.03, 0.96 ± 0.02, and 0.97 ± 0.01 in the test set, and 0.94 ± 0.05, 0.92 ± 0.04, and 0.98 ± 0.02 in the HCC data set, for the subcutaneous, muscle, and visceral adipose tissue compartments, respectively. Performance met or exceeded that of expert manual segmentation. Conclusion Model performance met or exceeded the accuracy of expert manual segmentation of CT examinations for both the test data set and the hepatocellular carcinoma data set. The model generalized well to multiple levels of the abdomen and may be capable of fully automated quantification of body composition metrics in three-dimensional CT examinations. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by Chang in this issue.


Subject(s)
Body Composition , Deep Learning , Pattern Recognition, Automated , Radiographic Image Interpretation, Computer-Assisted/methods , Radiography, Abdominal , Tomography, X-Ray Computed , Adult , Aged , Aged, 80 and over , Algorithms , Carcinoma, Hepatocellular/diagnostic imaging , Humans , Liver Neoplasms/diagnostic imaging , Middle Aged , Retrospective Studies
14.
J Digit Imaging ; 32(4): 571-581, 2019 08.
Article in English | MEDLINE | ID: mdl-31089974

ABSTRACT

Deep-learning algorithms typically fall within the domain of supervised artificial intelligence and are designed to "learn" from annotated data. Deep-learning models require large, diverse training datasets for optimal model convergence. The effort to curate these datasets is widely regarded as a barrier to the development of deep-learning systems. We developed RIL-Contour to accelerate medical image annotation for and with deep-learning. A major goal driving the development of the software was to create an environment which enables clinically oriented users to utilize deep-learning models to rapidly annotate medical imaging. RIL-Contour supports using fully automated deep-learning methods, semi-automated methods, and manual methods to annotate medical imaging with voxel and/or text annotations. To reduce annotation error, RIL-Contour promotes the standardization of image annotations across a dataset. RIL-Contour accelerates medical imaging annotation through the process of annotation by iterative deep learning (AID). The underlying concept of AID is to iteratively annotate, train, and utilize deep-learning models during the process of dataset annotation and model development. To enable this, RIL-Contour supports workflows in which multiple-image analysts annotate medical images, radiologists approve the annotations, and data scientists utilize these annotations to train deep-learning models. To automate the feedback loop between data scientists and image analysts, RIL-Contour provides mechanisms to enable data scientists to push deep newly trained deep-learning models to other users of the software. RIL-Contour and the AID methodology accelerate dataset annotation and model development by facilitating rapid collaboration between analysts, radiologists, and engineers.


Subject(s)
Datasets as Topic , Deep Learning , Diagnostic Imaging/methods , Image Processing, Computer-Assisted/methods , Radiology Information Systems , Humans
15.
AJR Am J Roentgenol ; 211(6): 1184-1193, 2018 12.
Article in English | MEDLINE | ID: mdl-30403527

ABSTRACT

OBJECTIVE: Deep learning has shown great promise for improving medical image classification tasks. However, knowing what aspects of an image the deep learning system uses or, in a manner of speaking, sees to make its prediction is difficult. MATERIALS AND METHODS: Within a radiologic imaging context, we investigated the utility of methods designed to identify features within images on which deep learning activates. In this study, we developed a classifier to identify contrast enhancement phase from whole-slice CT data. We then used this classifier as an easily interpretable system to explore the utility of class activation map (CAMs), gradient-weighted class activation maps (Grad-CAMs), saliency maps, guided backpropagation maps, and the saliency activation map, a novel map reported here, to identify image features the model used when performing prediction. RESULTS: All techniques identified voxels within imaging that the classifier used. SAMs had greater specificity than did guided backpropagation maps, CAMs, and Grad-CAMs at identifying voxels within imaging that the model used to perform prediction. At shallow network layers, SAMs had greater specificity than Grad-CAMs at identifying input voxels that the layers within the model used to perform prediction. CONCLUSION: As a whole, voxel-level visualizations and visualizations of the imaging features that activate shallow network layers are powerful techniques to identify features that deep learning models use when performing prediction.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted , Tomography, X-Ray Computed , Algorithms , Humans , Sensitivity and Specificity
16.
Neuroradiology ; 60(1): 35-42, 2018 Jan.
Article in English | MEDLINE | ID: mdl-29103145

ABSTRACT

PURPOSE: Our study tested the diagnostic accuracy of increased signal intensity (SI) within FLAIR MR images of resection cavities in differentiating early progressive disease (ePD) from pseudoprogression (PsP) in patients with glioblastoma treated with radiotherapy with concomitant temozolomide therapy. METHODS: In this retrospective study approved by our Institutional Review Board, we evaluated the records of 122 consecutive patients with partially or totally resected glioblastoma. Region of interest (ROI) analysis assessed 33 MR examinations from 11 subjects with histologically confirmed ePD and 37 MR examinations from 14 subjects with PsP (5 histologically confirmed, 9 clinically diagnosed). After applying an N4 bias correction algorithm to remove B0 field distortion and to standardize image intensities and then normalizing the intensities based on an ROI of uninvolved white matter from the contralateral hemisphere, the mean intensities of the ROI from within the resection cavities were calculated. Measures of diagnostic performance were calculated from the receiver operating characteristic (ROC) curve using the threshold intensity that maximized differentiation. Subgroup analysis explored differences between the patients with biopsy-confirmed disease. RESULTS: At an optimal threshold intensity of 2.9, the area under the ROC curve (AUROC) for FLAIR to differentiate ePD from PsP was 0.79 (95% confidence interval 0.686-0.873) with a sensitivity of 0.818 and specificity of 0.694. The AUROC increased to 0.86 when only the patients with biopsy-confirmed PsP were considered. CONCLUSIONS: Increased SI within the resection cavity of FLAIR images is not a highly specific sign of ePD in glioblastoma patients treated with the Stupp protocol.


Subject(s)
Antineoplastic Agents, Alkylating/therapeutic use , Brain Neoplasms/diagnostic imaging , Brain Neoplasms/therapy , Dacarbazine/analogs & derivatives , Glioblastoma/diagnostic imaging , Glioblastoma/therapy , Magnetic Resonance Imaging/methods , Combined Modality Therapy , Dacarbazine/therapeutic use , Disease Progression , Female , Humans , Male , Middle Aged , Retrospective Studies , Sensitivity and Specificity , Temozolomide
17.
J Digit Imaging ; 31(2): 252-261, 2018 04.
Article in English | MEDLINE | ID: mdl-28924878

ABSTRACT

Schizophrenia has been proposed to result from impairment of functional connectivity. We aimed to use machine learning to distinguish schizophrenic subjects from normal controls using a publicly available functional MRI (fMRI) data set. Global and local parameters of functional connectivity were extracted for classification. We found decreased global and local network connectivity in subjects with schizophrenia, particularly in the anterior right cingulate cortex, the superior right temporal region, and the inferior left parietal region as compared to healthy subjects. Using support vector machine and 10-fold cross-validation, nine features reached 92.1% prediction accuracy, respectively. Our results suggest that there are significant differences between control and schizophrenic subjects based on regional brain activity detected with fMRI.


Subject(s)
Brain Mapping/methods , Brain/physiopathology , Image Interpretation, Computer-Assisted/methods , Machine Learning , Magnetic Resonance Imaging/methods , Schizophrenia/physiopathology , Adult , Brain/diagnostic imaging , Female , Humans , Male , Young Adult
18.
Kidney Int ; 92(5): 1206-1216, 2017 11.
Article in English | MEDLINE | ID: mdl-28532709

ABSTRACT

Magnetic resonance imaging (MRI) examinations provide high-resolution information about the anatomic structure of the kidneys and are used to measure total kidney volume (TKV) in patients with Autosomal Dominant Polycystic Kidney Disease (ADPKD). Height-adjusted TKV (HtTKV) has become the gold-standard imaging biomarker for ADPKD progression at early stages of the disease when estimated glomerular filtration rate (eGFR) is still normal. However, HtTKV does not take advantage of the wealth of information provided by MRI. Here we tested whether image texture features provide additional insights into the ADPKD kidney that may be used as complementary information to existing biomarkers. A retrospective cohort of 122 patients from the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) study was identified who had T2-weighted MRIs and eGFR values over 70 mL/min/1.73m2 at the time of their baseline scan. We computed nine distinct image texture features for each patient. The ability of each feature to predict subsequent progression to CKD stage 3A, 3B, and 30% reduction in eGFR at eight-year follow-up was assessed. A multiple linear regression model was developed incorporating age, baseline eGFR, HtTKV, and three image texture features identified by stability feature selection (Entropy, Correlation, and Energy). Including texture in a multiple linear regression model (predicting percent change in eGFR) improved Pearson correlation coefficient from -0.51 (using age, eGFR, and HtTKV) to -0.70 (adding texture). Thus, texture analysis offers an approach to refine ADPKD prognosis and should be further explored for its utility in individualized clinical decision making and outcome prediction.


Subject(s)
Image Processing, Computer-Assisted/methods , Kidney/pathology , Magnetic Resonance Imaging/methods , Polycystic Kidney, Autosomal Dominant/diagnostic imaging , Renal Insufficiency, Chronic/diagnostic imaging , Adult , Biomarkers/analysis , Body Height , Clinical Decision-Making/methods , Disease Progression , Female , Follow-Up Studies , Glomerular Filtration Rate , Humans , Kidney/diagnostic imaging , Kidney/physiopathology , Linear Models , Male , Multivariate Analysis , Organ Size , Polycystic Kidney, Autosomal Dominant/complications , Polycystic Kidney, Autosomal Dominant/physiopathology , Predictive Value of Tests , Prognosis , Renal Insufficiency, Chronic/etiology , Renal Insufficiency, Chronic/physiopathology , Retrospective Studies , Young Adult
19.
Radiographics ; 37(2): 505-515, 2017.
Article in English | MEDLINE | ID: mdl-28212054

ABSTRACT

Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017.


Subject(s)
Diagnostic Imaging , Machine Learning , Algorithms , Humans , Image Interpretation, Computer-Assisted
20.
J Digit Imaging ; 30(4): 400-405, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28315069

ABSTRACT

Deep learning is an important new area of machine learning which encompasses a wide range of neural network architectures designed to complete various tasks. In the medical imaging domain, example tasks include organ segmentation, lesion detection, and tumor classification. The most popular network architecture for deep learning for images is the convolutional neural network (CNN). Whereas traditional machine learning requires determination and calculation of features from which the algorithm learns, deep learning approaches learn the important features as well as the proper weighting of those features to make predictions for new data. In this paper, we will describe some of the libraries and tools that are available to aid in the construction and efficient execution of deep learning as applied to medical images.


Subject(s)
Diagnostic Imaging , Machine Learning , Neural Networks, Computer , Algorithms , Documentation , Humans , Software
SELECTION OF CITATIONS
SEARCH DETAIL