Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 89
1.
Sci Rep ; 14(1): 5383, 2024 03 05.
Article En | MEDLINE | ID: mdl-38443410

Breast density, or the amount of fibroglandular tissue (FGT) relative to the overall breast volume, increases the risk of developing breast cancer. Although previous studies have utilized deep learning to assess breast density, the limited public availability of data and quantitative tools hinders the development of better assessment tools. Our objective was to (1) create and share a large dataset of pixel-wise annotations according to well-defined criteria, and (2) develop, evaluate, and share an automated segmentation method for breast, FGT, and blood vessels using convolutional neural networks. We used the Duke Breast Cancer MRI dataset to randomly select 100 MRI studies and manually annotated the breast, FGT, and blood vessels for each study. Model performance was evaluated using the Dice similarity coefficient (DSC). The model achieved DSC values of 0.92 for breast, 0.86 for FGT, and 0.65 for blood vessels on the test set. The correlation between our model's predicted breast density and the manually generated masks was 0.95. The correlation between the predicted breast density and qualitative radiologist assessment was 0.75. Our automated models can accurately segment breast, FGT, and blood vessels using pre-contrast breast MRI data. The data and the models were made publicly available.


Breast Neoplasms , Deep Learning , Humans , Female , Magnetic Resonance Imaging , Radiography , Breast Density , Breast Neoplasms/diagnostic imaging
2.
Ann Thorac Surg ; 117(2): 413-421, 2024 Feb.
Article En | MEDLINE | ID: mdl-37031770

BACKGROUND: There is no consensus on the optimal allograft sizing strategy for lung transplantation in restrictive lung disease. Current methods that are based on predicted total lung capacity (pTLC) ratios do not account for the diminutive recipient chest size. The study investigators hypothesized that a new sizing ratio incorporating preoperative recipient computed tomographic lung volumes (CTVol) would be associated with postoperative outcomes. METHODS: A retrospective single-institution study was conducted of adults undergoing primary bilateral lung transplantation between January 2016 and July 2020 for restrictive lung disease. CTVol was computed for recipients by using advanced segmentation software. Two sizing ratios were calculated: pTLC ratio (pTLCdonor/pTLCrecipient) and a new volumetric ratio (pTLCdonor/CTVolrecipient). Patients were divided into reference, oversized, and undersized groups on the basis of ratio quintiles, and multivariable models were used to assess the effect of the ratios on primary graft dysfunction and survival. RESULTS: CTVol was successfully acquired in 218 of 220 (99.1%) patients. In adjusted analysis, undersizing on the basis of the volumetric ratio was independently associated with decreased primary graft dysfunction grade 2 or 3 within 72 hours (odds ratio, 0.42; 95% CI, 0.20-0.87; P =.02). The pTLC ratio was not significantly associated with primary graft dysfunction. Oversizing on the basis of the volumetric ratio was independently associated with an increased risk of death (hazard ratio, 2.27; 95% CI, 1.04-4.99; P =.04], whereas the pTLC ratio did not have a significant survival association. CONCLUSIONS: Using computed tomography-acquired lung volumes for donor-recipient size matching in lung transplantation is feasible with advanced segmentation software. This method may be more predictive of outcome compared with current sizing methods, which use gender and height only.


Lung Diseases , Lung Transplantation , Primary Graft Dysfunction , Adult , Humans , Lung/surgery , Retrospective Studies , Primary Graft Dysfunction/etiology , Organ Size , Lung Transplantation/methods , Lung Diseases/surgery , Tissue Donors , Tomography, X-Ray Computed
3.
Radiology ; 309(1): e222441, 2023 10.
Article En | MEDLINE | ID: mdl-37815445

Background PET can be used for amyloid-tau-neurodegeneration (ATN) classification in Alzheimer disease, but incurs considerable cost and exposure to ionizing radiation. MRI currently has limited use in characterizing ATN status. Deep learning techniques can detect complex patterns in MRI data and have potential for noninvasive characterization of ATN status. Purpose To use deep learning to predict PET-determined ATN biomarker status using MRI and readily available diagnostic data. Materials and Methods MRI and PET data were retrospectively collected from the Alzheimer's Disease Imaging Initiative. PET scans were paired with MRI scans acquired within 30 days, from August 2005 to September 2020. Pairs were randomly split into subsets as follows: 70% for training, 10% for validation, and 20% for final testing. A bimodal Gaussian mixture model was used to threshold PET scans into positive and negative labels. MRI data were fed into a convolutional neural network to generate imaging features. These features were combined in a logistic regression model with patient demographics, APOE gene status, cognitive scores, hippocampal volumes, and clinical diagnoses to classify each ATN biomarker component as positive or negative. Area under the receiver operating characteristic curve (AUC) analysis was used for model evaluation. Feature importance was derived from model coefficients and gradients. Results There were 2099 amyloid (mean patient age, 75 years ± 10 [SD]; 1110 male), 557 tau (mean patient age, 75 years ± 7; 280 male), and 2768 FDG PET (mean patient age, 75 years ± 7; 1645 male) and MRI pairs. Model AUCs for the test set were as follows: amyloid, 0.79 (95% CI: 0.74, 0.83); tau, 0.73 (95% CI: 0.58, 0.86); and neurodegeneration, 0.86 (95% CI: 0.83, 0.89). Within the networks, high gradients were present in key temporal, parietal, frontal, and occipital cortical regions. Model coefficients for cognitive scores, hippocampal volumes, and APOE status were highest. Conclusion A deep learning algorithm predicted each component of PET-determined ATN status with acceptable to excellent efficacy using MRI and other available diagnostic data. © RSNA, 2023 Supplemental material is available for this article.


Alzheimer Disease , Cognitive Dysfunction , Deep Learning , Aged , Humans , Male , Alzheimer Disease/diagnostic imaging , Amyloid , Amyloid beta-Peptides , Apolipoproteins E , Biomarkers , Magnetic Resonance Imaging/methods , Positron-Emission Tomography/methods , Retrospective Studies , tau Proteins , Female
4.
Radiol Artif Intell ; 5(5): e220275, 2023 Sep.
Article En | MEDLINE | ID: mdl-37795141

The Duke Liver Dataset contains 2146 abdominal MRI series from 105 patients, including a majority with cirrhotic features, and 310 image series with corresponding manually segmented liver masks.

5.
IEEE Trans Med Imaging ; 42(12): 3860-3870, 2023 Dec.
Article En | MEDLINE | ID: mdl-37695965

Anomaly detection (AD) aims to determine if an instance has properties different from those seen in normal cases. The success of this technique depends on how well a neural network learns from normal instances. We observe that the learning difficulty scales exponentially with the input resolution, making it infeasible to apply AD to high-resolution images. Resizing them to a lower resolution is a compromising solution and does not align with clinical practice where the diagnosis could depend on image details. In this work, we propose to train the network and perform inference at the patch level, through the sliding window algorithm. This simple operation allows the network to receive high-resolution images but introduces additional training difficulties, including inconsistent image structure and higher variance. We address these concerns by setting the network's objective to learn augmentation-invariant features. We further study the augmentation function in the context of medical imaging. In particular, we observe that the resizing operation, a key augmentation in general computer vision literature, is detrimental to detection accuracy, and the inverting operation can be beneficial. We also propose a new module that encourages the network to learn from adjacent patches to boost detection performance. Extensive experiments are conducted on breast tomosynthesis and chest X-ray datasets and our method improves 8.03% and 5.66% AUC on image-level classification respectively over the current leading techniques. The experimental results demonstrate the effectiveness of our approach.


Algorithms , Neural Networks, Computer , Supervised Machine Learning
6.
J Digit Imaging ; 36(6): 2402-2410, 2023 12.
Article En | MEDLINE | ID: mdl-37620710

Large numbers of radiographic images are available in musculoskeletal radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We used BioBERT and EfficientNet as the feature extraction backbone of the labeler and imaging model, respectively. We developed our approach using 7382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WA-AUC 0.903) value and higher AUC values among all classes (normal AUC 0.894; abnormal AUC 0.896, arthroplasty AUC 0.990) compared to the baseline model (WA-AUC = 0.857; normal AUC 0.842; abnormal AUC 0.848, arthroplasty AUC 0.987), trained using only manually labeled data. Statistical tests show that the improvement is significant on normal (p value < 0.002), abnormal (p value < 0.001), and WA-AUC (p value = 0.001). Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.


Knee Joint , Radiology , Humans , Radiography , Knee Joint/diagnostic imaging , Arthroplasty
7.
Med Image Anal ; 89: 102918, 2023 10.
Article En | MEDLINE | ID: mdl-37595404

Training segmentation models for medical images continues to be challenging due to the limited availability of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to segment user-defined objects of interest in an interactive manner. While the model performance on natural images is impressive, medical image domains pose their own set of challenges. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 19 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point and box prompts for SAM using a standard method that simulates interactive segmentation. We report the following findings: (1) SAM's performance based on single prompts highly varies depending on the dataset and the task, from IoU=0.1135 for spine MRI to IoU=0.8650 for hip X-ray. (2) Segmentation performance appears to be better for well-circumscribed objects with prompts with less ambiguity such as the segmentation of organs in computed tomography and poorer in various other scenarios such as the segmentation of brain tumors. (3) SAM performs notably better with box prompts than with point prompts. (4) SAM outperforms similar methods RITM, SimpleClick, and FocalClick in almost all single-point prompt settings. (5) When multiple-point prompts are provided iteratively, SAM's performance generally improves only slightly while other methods' performance improves to the level that surpasses SAM's point-based performance. We also provide several illustrations for SAM's performance on all tested datasets, iterative segmentation, and SAM's behavior given prompt ambiguity. We conclude that SAM shows impressive zero-shot segmentation performance for certain medical imaging datasets, but moderate to poor performance for others. SAM has the potential to make a significant impact in automated medical image segmentation in medical imaging, but appropriate care needs to be applied when using it. Code for evaluation SAM is made publicly available at https://github.com/mazurowski-lab/segment-anything-medical-evaluation.


Brain Neoplasms , Humans , S-Adenosylmethionine , Tomography, X-Ray Computed
8.
Eur J Radiol ; 166: 110979, 2023 Sep.
Article En | MEDLINE | ID: mdl-37473618

PURPOSE: Tools to predict a screening mammogram recall at the time of scheduling could improve patient care. We extracted patient demographic and breast care history information within the electronic medical record (EMR) for women undergoing digital breast tomosynthesis (DBT) to identify which factors were associated with a screening recall recommendation. METHOD: In 2018, 21,543 women aged 40 years or greater who underwent screening DBT at our institution were identified. Demographic information and breast care factors were extracted automatically from the EMR. The primary outcome was a screening recall recommendation of BI-RADS 0. A multivariable logistic regression model was built and included age, race, ethnicity groups, family breast cancer history, personal breast cancer history, surgical breast cancer history, recall history, and days since last available screening mammogram. RESULTS: Multiple factors were associated with a recall on the multivariable model: history of breast cancer surgery (OR: 2.298, 95% CI: 1.854, 2.836); prior recall within the last five years (vs no prior, OR: 0.768, 95% CI: 0.687, 0.858); prior screening mammogram within 0-18 months (vs no prior, OR: 0.601, 95% CI: 0.520, 0.691), prior screening mammogram within 18-30 months (vs no prior, OR: 0.676, 95% CI: 0.520, 0.691); and age (normalized OR: 0.723, 95% CI: 0.690, 0.758). CONCLUSIONS: It is feasible to predict a DBT screening recall recommendation using patient demographics and breast care factors that can be extracted automatically from the EMR.


Breast Neoplasms , Electronic Health Records , Female , Humans , Feasibility Studies , Mammography , Breast Neoplasms/diagnostic imaging , Breast Density , Breast , Early Detection of Cancer , Mass Screening , Retrospective Studies
9.
Artif Intell Med ; 141: 102553, 2023 07.
Article En | MEDLINE | ID: mdl-37295897

Machine learning (ML) for diagnosis of thyroid nodules on ultrasound is an active area of research. However, ML tools require large, well-labeled datasets, the curation of which is time-consuming and labor-intensive. The purpose of our study was to develop and test a deep-learning-based tool to facilitate and automate the data annotation process for thyroid nodules; we named our tool Multistep Automated Data Labelling Procedure (MADLaP). MADLaP was designed to take multiple inputs including pathology reports, ultrasound images, and radiology reports. Using multiple step-wise 'modules' including rule-based natural language processing, deep-learning-based imaging segmentation, and optical character recognition, MADLaP automatically identified images of a specific thyroid nodule and correctly assigned a pathology label. The model was developed using a training set of 378 patients across our health system and tested on a separate set of 93 patients. Ground truths for both sets were selected by an experienced radiologist. Performance metrics including yield (how many labeled images the model produced) and accuracy (percentage correct) were measured using the test set. MADLaP achieved a yield of 63 % and an accuracy of 83 %. The yield progressively increased as the input data moved through each module, while accuracy peaked part way through. Error analysis showed that inputs from certain examination sites had lower accuracy (40 %) than the other sites (90 %, 100 %). MADLaP successfully created curated datasets of labeled ultrasound images of thyroid nodules. While accurate, the relatively suboptimal yield of MADLaP exposed some challenges when trying to automatically label radiology images from heterogeneous sources. The complex task of image curation and annotation could be automated, allowing for enrichment of larger datasets for use in machine learning development.


Thyroid Nodule , Humans , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/pathology , Artificial Intelligence , Data Curation , Ultrasonography/methods , Neural Networks, Computer
10.
Med Image Anal ; 87: 102836, 2023 07.
Article En | MEDLINE | ID: mdl-37201220

Automated tumor detection in Digital Breast Tomosynthesis (DBT) is a difficult task due to natural tumor rarity, breast tissue variability, and high resolution. Given the scarcity of abnormal images and the abundance of normal images for this problem, an anomaly detection/localization approach could be well-suited. However, most anomaly localization research in machine learning focuses on non-medical datasets, and we find that these methods fall short when adapted to medical imaging datasets. The problem is alleviated when we solve the task from the image completion perspective, in which the presence of anomalies can be indicated by a discrepancy between the original appearance and its auto-completion conditioned on the surroundings. However, there are often many valid normal completions given the same surroundings, especially in the DBT dataset, making this evaluation criterion less precise. To address such an issue, we consider pluralistic image completion by exploring the distribution of possible completions instead of generating fixed predictions. This is achieved through our novel application of spatial dropout on the completion network during inference time only, which requires no additional training cost and is effective at generating diverse completions. We further propose minimum completion distance (MCD), a new metric for detecting anomalies, thanks to these stochastic completions. We provide theoretical as well as empirical support for the superiority over existing methods of using the proposed method for anomaly localization. On the DBT dataset, our model outperforms other state-of-the-art methods by at least 10% AUROC for pixel-level detection.


Breast Neoplasms , Mammography , Humans , Female , Mammography/methods , Machine Learning , Breast Neoplasms/diagnostic imaging
11.
Clin Imaging ; 99: 60-66, 2023 Jul.
Article En | MEDLINE | ID: mdl-37116263

OBJECTIVES: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. METHODS: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. RESULTS: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.69 (95% CI: 0.64-0.75). The AUC of radiologists were 0.63 (95% CI: 0.59-0.67), 0.66 (95% CI:0.61-0.71), 0.65 (95% CI: 0.60-0.70), and 0.63 (95%CI: 0.58-0.67). CONCLUSION: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists. The relative performance difference between the algorithm and the radiologists is not significantly affected by the difference of ultrasound scanner.


Deep Learning , Thyroid Nodule , Humans , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/pathology , Retrospective Studies , Ultrasonography/methods , Neural Networks, Computer
12.
JAMA Netw Open ; 6(2): e230524, 2023 02 01.
Article En | MEDLINE | ID: mdl-36821110

Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. Design, Setting, and Participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. Main Outcomes and Measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. Conclusions and Relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.


Artificial Intelligence , Breast Neoplasms , Humans , Female , Benchmarking , Mammography/methods , Algorithms , Radiographic Image Interpretation, Computer-Assisted/methods , Breast Neoplasms/diagnostic imaging
13.
J Digit Imaging ; 36(2): 666-678, 2023 04.
Article En | MEDLINE | ID: mdl-36544066

In this work we introduce a novel medical image style transfer method, StyleMapper, that can transfer medical scans to an unseen style with access to limited training data. This is made possible by training our model on unlimited possibilities of simulated random medical imaging styles on the training set, making our work more computationally efficient when compared with other style transfer methods. Moreover, our method enables arbitrary style transfer: transferring images to styles unseen in training. This is useful for medical imaging, where images are acquired using different protocols and different scanner models, resulting in a variety of styles that data may need to be transferred between. Our model disentangles image content from style and can modify an image's style by simply replacing the style encoding with one extracted from a single image of the target style, with no additional optimization required. This also allows the model to distinguish between different styles of images, including among those that were unseen in training. We propose a formal description of the proposed model. Experimental results on breast magnetic resonance images indicate the effectiveness of our method for style transfer. Our style transfer method allows for the alignment of medical images taken with different scanners into a single unified style dataset, allowing for the training of other downstream tasks on such a dataset for tasks such as classification, object detection and others.


Deep Learning , Humans , Magnetic Resonance Imaging , Radiography , Image Processing, Computer-Assisted/methods
14.
AJR Am J Roentgenol ; 220(3): 408-417, 2023 03.
Article En | MEDLINE | ID: mdl-36259591

BACKGROUND. In current clinical practice, thyroid nodules in children are generally evaluated on the basis of radiologists' overall impressions of ultrasound images. OBJECTIVE. The purpose of this article is to compare the diagnostic performance of radiologists' overall impression, the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS), and a deep learning algorithm in differentiating benign and malignant thyroid nodules on ultrasound in children and young adults. METHODS. This retrospective study included 139 patients (median age 17.5 years; 119 female patients, 20 male patients) evaluated from January 1, 2004, to September 18, 2020, who were 21 years old and younger with a thyroid nodule on ultrasound with definitive pathologic results from fine-needle aspiration and/or surgical excision to serve as the reference standard. A single nodule per patient was selected, and one transverse and one longitudinal image each of the nodules were extracted for further evaluation. Three radiologists independently characterized nodules on the basis of their overall impression (benign vs malignant) and ACR TI-RADS. A previously developed deep learning algorithm determined for each nodule a likelihood of malignancy, which was used to derive a risk level. Sensitivities and specificities for malignancy were calculated. Agreement was assessed using Cohen kappa coefficients. RESULTS. For radiologists' overall impression, sensitivity ranged from 32.1% to 75.0% (mean, 58.3%; 95% CI, 49.2-67.3%), and specificity ranged from 63.8% to 93.9% (mean, 79.9%; 95% CI, 73.8-85.7%). For ACR TI-RADS, sensitivity ranged from 82.1% to 87.5% (mean, 85.1%; 95% CI, 77.3-92.1%), and specificity ranged from 47.0% to 54.2% (mean, 50.6%; 95% CI, 41.4-59.8%). The deep learning algorithm had a sensitivity of 87.5% (95% CI, 78.3-95.5%) and specificity of 36.1% (95% CI, 25.6-46.8%). Interobserver agreement among pairwise combinations of readers, expressed as kappa, for overall impression was 0.227-0.472 and for ACR TI-RADS was 0.597-0.643. CONCLUSION. Both ACR TI-RADS and the deep learning algorithm had higher sensitivity albeit lower specificity compared with overall impressions. The deep learning algorithm had similar sensitivity but lower specificity than ACR TI-RADS. Interobserver agreement was higher for ACR TI-RADS than for overall impressions. CLINICAL IMPACT. ACR TI-RADS and the deep learning algorithm may serve as potential alternative strategies for guiding decisions to perform fine-needle aspiration of thyroid nodules in children.


Deep Learning , Thyroid Nodule , Humans , Male , Child , Female , Young Adult , Adolescent , Adult , Thyroid Nodule/pathology , Retrospective Studies , Ultrasonography/methods , Radiologists
16.
AJR Am J Roentgenol ; 219(4): 1-8, 2022 10.
Article En | MEDLINE | ID: mdl-35383487

Artificial intelligence (AI) methods for evaluating thyroid nodules on ultrasound have been widely described in the literature, with reported performance of AI tools matching or in some instances surpassing radiologists' performance. As these data have accumulated, products for classification and risk stratification of thyroid nodules on ultrasound have become commercially available. This article reviews FDA-approved products currently on the market, with a focus on product features, reported performance, and considerations for implementation. The products perform risk stratification primarily using a Thyroid Imaging Reporting and Data System (TIRADS), though may provide additional prediction tools independent of TIRADS. Key issues in implementation include integration with radiologist interpretation, impact on workflow and efficiency, and performance monitoring. AI applications beyond nodule classification, including report construction and incidental findings follow-up, are also described. Anticipated future directions of research and development in AI tools for thyroid nodules are highlighted.


Thyroid Neoplasms , Thyroid Nodule , Artificial Intelligence , Humans , Thyroid Nodule/diagnostic imaging , Ultrasonography/methods
17.
BMC Med Inform Decis Mak ; 22(1): 102, 2022 04 15.
Article En | MEDLINE | ID: mdl-35428335

BACKGROUND: There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation. METHODS: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS: Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems. CONCLUSIONS: Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.


Deep Learning , Abdomen , Humans , Neural Networks, Computer , Pelvis/diagnostic imaging , Tomography, X-Ray Computed
18.
Radiol Artif Intell ; 4(1): e210026, 2022 Jan.
Article En | MEDLINE | ID: mdl-35146433

PURPOSE: To design multidisease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports. MATERIALS AND METHODS: This retrospective study included a total of 12 092 patients (mean age, 57 years ± 18 [standard deviation]; 6172 women) for model development and testing. Rule-based algorithms were used to extract 19 225 disease labels from 13 667 body CT scans performed between 2012 and 2017. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura, liver and gallbladder, and kidneys and ureters. For each organ system, a three-dimensional convolutional neural network classified each as no apparent disease or for the presence of four common diseases, for a total of 15 different labels across all three models. Testing was performed on a subset of 2158 CT volumes relative to 2875 manually derived reference labels from 2133 patients (mean age, 58 years ± 18; 1079 women). Performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS: Manual validation of the extracted labels confirmed 91%-99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were as follows: atelectasis, 0.77 (95% CI: 0.74, 0.81); nodule, 0.65 (95% CI: 0.61, 0.69); emphysema, 0.89 (95% CI: 0.86, 0.92); effusion, 0.97 (95% CI: 0.96, 0.98); and no apparent disease, 0.89 (95% CI: 0.87, 0.91). AUCs for liver and gallbladder were as follows: hepatobiliary calcification, 0.62 (95% CI: 0.56, 0.67); lesion, 0.73 (95% CI: 0.69, 0.77); dilation, 0.87 (95% CI: 0.84, 0.90); fatty, 0.89 (95% CI: 0.86, 0.92); and no apparent disease, 0.82 (95% CI: 0.78, 0.85). AUCs for kidneys and ureters were as follows: stone, 0.83 (95% CI: 0.79, 0.87); atrophy, 0.92 (95% CI: 0.89, 0.94); lesion, 0.68 (95% CI: 0.64, 0.72); cyst, 0.70 (95% CI: 0.66, 0.73); and no apparent disease, 0.79 (95% CI: 0.75, 0.83). CONCLUSION: Weakly supervised deep learning models were able to classify diverse diseases in multiple organ systems from CT scans.Keywords: CT, Diagnosis/Classification/Application Domain, Semisupervised Learning, Whole-Body Imaging© RSNA, 2022.

19.
Radiology ; 303(1): 54-62, 2022 04.
Article En | MEDLINE | ID: mdl-34981975

Background Improving diagnosis of ductal carcinoma in situ (DCIS) before surgery is important in choosing optimal patient management strategies. However, patients may harbor occult invasive disease not detected until definitive surgery. Purpose To assess the performance and clinical utility of mammographic radiomic features in the prediction of occult invasive cancer among women diagnosed with DCIS on the basis of core biopsy findings. Materials and Methods In this Health Insurance Portability and Accountability Act-compliant retrospective study, digital magnification mammographic images were collected from women who underwent breast core-needle biopsy for calcifications that was performed at a single institution between September 2008 and April 2017 and yielded a diagnosis of DCIS. The database query was directed at asymptomatic women with calcifications without a mass, architectural distortion, asymmetric density, or palpable disease. Logistic regression with regularization was used. Differences across training and internal test set by upstaging rate, age, lesion size, and estrogen and progesterone receptor status were assessed by using the Kruskal-Wallis or χ2 test. Results The study consisted of 700 women with DCIS (age range, 40-89 years; mean age, 59 years ± 10 [standard deviation]), including 114 with lesions (16.3%) upstaged to invasive cancer at subsequent surgery. The sample was split randomly into 400 women for the training set and 300 for the testing set (mean ages: training set, 59 years ± 10; test set, 59 years ± 10; P = .85). A total of 109 radiomic and four clinical features were extracted. The best model on the test set by using all radiomic and clinical features helped predict upstaging with an area under the receiver operating characteristic curve of 0.71 (95% CI: 0.62, 0.79). For a fixed high sensitivity (90%), the model yielded a specificity of 22%, a negative predictive value of 92%, and an odds ratio of 2.4 (95% CI: 1.8, 3.2). High specificity (90%) corresponded to a sensitivity of 37%, positive predictive value of 41%, and odds ratio of 5.0 (95% CI: 2.8, 9.0). Conclusion Machine learning models that use radiomic features applied to mammographic calcifications may help predict upstaging of ductal carcinoma in situ, which can refine clinical decision making and treatment planning. © RSNA, 2022.


Breast Neoplasms , Calcinosis , Carcinoma in Situ , Carcinoma, Ductal, Breast , Carcinoma, Intraductal, Noninfiltrating , Adult , Aged , Aged, 80 and over , Breast Neoplasms/diagnostic imaging , Carcinoma, Ductal, Breast/pathology , Carcinoma, Intraductal, Noninfiltrating/diagnostic imaging , Carcinoma, Intraductal, Noninfiltrating/pathology , Female , Humans , Male , Mammography , Middle Aged , Retrospective Studies
20.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 1688-1698, 2022 04.
Article En | MEDLINE | ID: mdl-33112740

Recognizing and organizing different series in an MRI examination is important both for clinical review and research, but it is poorly addressed by the current generation of picture archiving and communication systems (PACSs) and post-processing workstations. In this paper, we study the problem of using deep convolutional neural networks for automatic classification of abdominal MRI series to one of many series types. Our contributions are three-fold. First, we created a large abdominal MRI dataset containing 3717 MRI series including 188,665 individual images, derived from liver examinations. 30 different series types are represented in this dataset. The dataset was annotated by consensus readings from two radiologists. Both the MRIs and the annotations were made publicly available. Second, we proposed a 3D pyramid pooling network, which can elegantly handle abdominal MRI series with varied sizes of each dimension, and achieved state-of-the-art classification performance. Third, we performed the first ever comparison between the algorithm and the radiologists on an additional dataset and had several meaningful findings.


Algorithms , Magnetic Resonance Imaging , Liver , Magnetic Resonance Imaging/methods , Neural Networks, Computer
...