Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
1.
Radiology ; 292(3): 695-701, 2019 09.
Article in English | MEDLINE | ID: mdl-31287391

ABSTRACT

BackgroundManagement of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules.PurposeTo develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS).Materials and MethodsIn this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice.ResultsIncluded were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P > .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively.ConclusionSensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.© RSNA, 2019Online supplemental material is available for this article.


Subject(s)
Deep Learning , Image Interpretation, Computer-Assisted/methods , Thyroid Nodule/diagnostic imaging , Ultrasonography/methods , Female , Humans , Male , Middle Aged , Reproducibility of Results , Retrospective Studies , Sensitivity and Specificity , Thyroid Gland/diagnostic imaging
2.
Radiology ; 292(1): 112-119, 2019 07.
Article in English | MEDLINE | ID: mdl-31112088

ABSTRACT

Background Risk stratification systems for thyroid nodules are often complicated and affected by low specificity. Continual improvement of these systems is necessary to reduce the number of unnecessary thyroid biopsies. Purpose To use artificial intelligence (AI) to optimize the American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods A total of 1425 biopsy-proven thyroid nodules from 1264 consecutive patients (1026 women; mean age, 52.9 years [range, 18-93 years]) were evaluated retrospectively. Expert readers assigned points based on five ACR TI-RADS categories (composition, echogenicity, shape, margin, echogenic foci), and a genetic AI algorithm was applied to a training set (1325 nodules). Point and pathologic data were used to create an optimized scoring system (hereafter, AI TI-RADS). Performance of the systems was compared by using a test set of the final 100 nodules with interpretations from the expert reader, eight nonexpert readers, and an expert panel. Initial performance of AI TI-RADS was calculated by using a test for differences between binomial proportions. Additional comparisons across readers were conducted by using bootstrapping; diagnostic performance was assessed by using area under the receiver operating curve. Results AI TI-RADS assigned new point values for eight ACR TI-RADS features. Six features were assigned zero points, which simplified categorization. By using expert reader data, the diagnostic performance of ACR TI-RADS and AI TI-RADS was area under the receiver operating curve of 0.91 and 0.93, respectively. For the same expert, specificity of AI TI-RADS (65%, 55 of 85) was higher (P < .001) than that of ACR TI-RADS (47%, 40 of 85). For the eight nonexpert radiologists, mean specificity for AI TI-RADS (55%) was also higher (P < .001) than that of ACR TI-RADS (48%). An interactive AI TI-RADS calculator can be viewed at http://deckard.duhs.duke.edu/∼ai-ti-rads . Conclusion An artificial intelligence-optimized Thyroid Imaging Reporting and Data System (TI-RADS) validates the American College of Radiology TI-RADS while slightly improving specificity and maintaining sensitivity. Additionally, it simplifies feature assignments, which may improve ease of use. © RSNA, 2019 Online supplemental material is available for this article.


Subject(s)
Artificial Intelligence , Diagnostic Imaging/methods , Image Interpretation, Computer-Assisted/methods , Radiology Information Systems , Thyroid Nodule/diagnostic imaging , Adolescent , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Reproducibility of Results , Retrospective Studies , Risk Assessment , Sensitivity and Specificity , Societies, Medical , Thyroid Gland/diagnostic imaging , United States , Young Adult
3.
J Magn Reson Imaging ; 49(4): 939-954, 2019 04.
Article in English | MEDLINE | ID: mdl-30575178

ABSTRACT

Deep learning is a branch of artificial intelligence where networks of simple interconnected units are used to extract patterns from data in order to solve complex problems. Deep-learning algorithms have shown groundbreaking performance in a variety of sophisticated tasks, especially those related to images. They have often matched or exceeded human performance. Since the medical field of radiology mainly relies on extracting useful information from images, it is a very natural application area for deep learning, and research in this area has rapidly grown in recent years. In this article, we discuss the general context of radiology and opportunities for application of deep-learning algorithms. We also introduce basic concepts of deep learning, including convolutional neural networks. Then, we present a survey of the research in deep learning applied to radiology. We organize the studies by the types of specific tasks that they attempt to solve and review a broad range of deep-learning algorithms being utilized. Finally, we briefly discuss opportunities and challenges for incorporating deep learning in the radiology practice of the future. Level of Evidence: 3 Technical Efficacy: Stage 1 J. Magn. Reson. Imaging 2019;49:939-954.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Radiology/methods , Algorithms , Artificial Intelligence , Diagnostic Tests, Routine , Humans , Machine Learning , Neural Networks, Computer , Radiography
4.
Clin Imaging ; 99: 60-66, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37116263

ABSTRACT

OBJECTIVES: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. METHODS: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. RESULTS: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.69 (95% CI: 0.64-0.75). The AUC of radiologists were 0.63 (95% CI: 0.59-0.67), 0.66 (95% CI:0.61-0.71), 0.65 (95% CI: 0.60-0.70), and 0.63 (95%CI: 0.58-0.67). CONCLUSION: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists. The relative performance difference between the algorithm and the radiologists is not significantly affected by the difference of ultrasound scanner.


Subject(s)
Deep Learning , Thyroid Nodule , Humans , Thyroid Nodule/diagnostic imaging , Thyroid Nodule/pathology , Retrospective Studies , Ultrasonography/methods , Neural Networks, Computer
5.
JAMA Netw Open ; 6(2): e230524, 2023 02 01.
Article in English | MEDLINE | ID: mdl-36821110

ABSTRACT

Importance: An accurate and robust artificial intelligence (AI) algorithm for detecting cancer in digital breast tomosynthesis (DBT) could significantly improve detection accuracy and reduce health care costs worldwide. Objectives: To make training and evaluation data for the development of AI algorithms for DBT analysis available, to develop well-defined benchmarks, and to create publicly available code for existing methods. Design, Setting, and Participants: This diagnostic study is based on a multi-institutional international grand challenge in which research teams developed algorithms to detect lesions in DBT. A data set of 22 032 reconstructed DBT volumes was made available to research teams. Phase 1, in which teams were provided 700 scans from the training set, 120 from the validation set, and 180 from the test set, took place from December 2020 to January 2021, and phase 2, in which teams were given the full data set, took place from May to July 2021. Main Outcomes and Measures: The overall performance was evaluated by mean sensitivity for biopsied lesions using only DBT volumes with biopsied lesions; ties were broken by including all DBT volumes. Results: A total of 8 teams participated in the challenge. The team with the highest mean sensitivity for biopsied lesions was the NYU B-Team, with 0.957 (95% CI, 0.924-0.984), and the second-place team, ZeDuS, had a mean sensitivity of 0.926 (95% CI, 0.881-0.964). When the results were aggregated, the mean sensitivity for all submitted algorithms was 0.879; for only those who participated in phase 2, it was 0.926. Conclusions and Relevance: In this diagnostic study, an international competition produced algorithms with high sensitivity for using AI to detect lesions on DBT images. A standardized performance benchmark for the detection task using publicly available clinical imaging data was released, with detailed descriptions and analyses of submitted algorithms accompanied by a public release of their predictions and code for selected methods. These resources will serve as a foundation for future research on computer-assisted diagnosis methods for DBT, significantly lowering the barrier of entry for new researchers.


Subject(s)
Artificial Intelligence , Breast Neoplasms , Humans , Female , Benchmarking , Mammography/methods , Algorithms , Radiographic Image Interpretation, Computer-Assisted/methods , Breast Neoplasms/diagnostic imaging
6.
Sci Rep ; 11(1): 10276, 2021 05 13.
Article in English | MEDLINE | ID: mdl-33986361

ABSTRACT

Deep learning has shown tremendous potential in the task of object detection in images. However, a common challenge with this task is when only a limited number of images containing the object of interest are available. This is a particular issue in cancer screening, such as digital breast tomosynthesis (DBT), where less than 1% of cases contain cancer. In this study, we propose a method to train an inpainting generative adversarial network to be used for cancer detection using only images that do not contain cancer. During inference, we removed a part of the image and used the network to complete the removed part. A significant error in completing an image part was considered an indication that such location is unexpected and thus abnormal. A large dataset of DBT images used in this study was collected at Duke University. It consisted of 19,230 reconstructed volumes from 4348 patients. Cancerous masses and architectural distortions were marked with bounding boxes by radiologists. Our experiments showed that the locations containing cancer were associated with a notably higher completion error than the non-cancer locations (mean error ratio of 2.77). All data used in this study has been made publicly available by the authors.


Subject(s)
Breast Neoplasms/diagnostic imaging , Breast/diagnostic imaging , Computer Simulation , Mammography/methods , Neural Networks, Computer , Female , Humans , Middle Aged , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods
7.
JAMA Netw Open ; 4(8): e2119100, 2021 08 02.
Article in English | MEDLINE | ID: mdl-34398205

ABSTRACT

Importance: Breast cancer screening is among the most common radiological tasks, with more than 39 million examinations performed each year. While it has been among the most studied medical imaging applications of artificial intelligence, the development and evaluation of algorithms are hindered by the lack of well-annotated, large-scale publicly available data sets. Objectives: To curate, annotate, and make publicly available a large-scale data set of digital breast tomosynthesis (DBT) images to facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening; to develop a baseline deep learning model for breast cancer detection; and to test this model using the data set to serve as a baseline for future research. Design, Setting, and Participants: In this diagnostic study, 16 802 DBT examinations with at least 1 reconstruction view available, performed between August 26, 2014, and January 29, 2018, were obtained from Duke Health System and analyzed. From the initial cohort, examinations were divided into 4 groups and split into training and test sets for the development and evaluation of a deep learning model. Images with foreign objects or spot compression views were excluded. Data analysis was conducted from January 2018 to October 2020. Exposures: Screening DBT. Main Outcomes and Measures: The detection algorithm was evaluated with breast-based free-response receiver operating characteristic curve and sensitivity at 2 false positives per volume. Results: The curated data set contained 22 032 reconstructed DBT volumes that belonged to 5610 studies from 5060 patients with a mean (SD) age of 55 (11) years and 5059 (100.0%) women. This included 4 groups of studies: (1) 5129 (91.4%) normal studies; (2) 280 (5.0%) actionable studies, for which where additional imaging was needed but no biopsy was performed; (3) 112 (2.0%) benign biopsied studies; and (4) 89 studies (1.6%) with cancer. Our data set included masses and architectural distortions that were annotated by 2 experienced radiologists. Our deep learning model reached breast-based sensitivity of 65% (39 of 60; 95% CI, 56%-74%) at 2 false positives per DBT volume on a test set of 460 examinations from 418 patients. Conclusions and Relevance: The large, diverse, and curated data set presented in this study could facilitate the development and evaluation of artificial intelligence algorithms for breast cancer screening by providing data for training as well as a common set of cases for model validation. The performance of the model developed in this study showed that the task remains challenging; its performance could serve as a baseline for future model development.


Subject(s)
Breast Neoplasms/diagnosis , Datasets as Topic , Deep Learning , Early Detection of Cancer/methods , Mammography , Aged , Breast/diagnostic imaging , False Positive Reactions , Female , Humans , Middle Aged , ROC Curve , Reproducibility of Results
8.
Radiol Artif Intell ; 2(1): e180050, 2020 Jan.
Article in English | MEDLINE | ID: mdl-33937809

ABSTRACT

PURPOSE: To employ deep learning to predict genomic subtypes of lower-grade glioma (LLG) tumors based on their appearance at MRI. MATERIALS AND METHODS: Imaging data from The Cancer Imaging Archive and genomic data from The Cancer Genome Atlas from 110 patients from five institutions with lower-grade gliomas (World Health Organization grade II and III) were used in this study. A convolutional neural network was trained to predict tumor genomic subtype based on the MRI of the tumor. Two different deep learning approaches were tested: training from random initialization and transfer learning. Deep learning models were pretrained on glioblastoma MRI, instead of natural images, to determine if performance was improved for the detection of LGGs. The models were evaluated using area under the receiver operating characteristic curve (AUC) with cross-validation. Imaging data and annotations used in this study are publicly available. RESULTS: The best performing model was based on transfer learning from glioblastoma MRI. It achieved AUC of 0.730 (95% confidence interval [CI]: 0.605, 0.844) for discriminating cluster-of-clusters 2 from others. For the same task, a network trained from scratch achieved an AUC of 0.680 (95% CI: 0.538, 0.811), whereas a model pretrained on natural images achieved an AUC of 0.640 (95% CI: 0.521, 0.763). CONCLUSION: These findings show the potential of utilizing deep learning to identify relationships between cancer imaging and cancer genomics in LGGs. However, more accurate models are needed to justify clinical use of such tools, which might be obtained using substantially larger training datasets.Supplemental material is available for this article.© RSNA, 2020.

9.
Ultrasound Med Biol ; 46(2): 415-421, 2020 02.
Article in English | MEDLINE | ID: mdl-31699547

ABSTRACT

Computer-aided segmentation of thyroid nodules in ultrasound imaging could assist in their accurate characterization. In this study, using data for 1278 nodules, we proposed and evaluated two methods for deep learning-based segmentation of thyroid nodules that utilize calipers present in the images. The first method used approximate nodule masks generated based on the calipers. The second method combined manual annotations with automatic guidance by the calipers. When only approximate nodule masks were used for training, the achieved Dice similarity coefficient (DSC) was 85.1%. The performance of a network trained using manual annotations was DSC = 90.4%. When the guidance by the calipers was added, the performance increased to DSC = 93.1%. An increase in the number of cases used for training resulted in increased performance for all methods. The proposed method utilizing the guidance by calipers matched the performance of the network that did not use it with a reduced number of manually annotated training cases.


Subject(s)
Deep Learning , Thyroid Nodule/diagnostic imaging , Humans , Retrospective Studies , Ultrasonography/methods
10.
Comput Biol Med ; 109: 218-225, 2019 06.
Article in English | MEDLINE | ID: mdl-31078126

ABSTRACT

Recent analysis identified distinct genomic subtypes of lower-grade glioma tumors which are associated with shape features. In this study, we propose a fully automatic way to quantify tumor imaging characteristics using deep learning-based segmentation and test whether these characteristics are predictive of tumor genomic subtypes. We used preoperative imaging and genomic data of 110 patients from 5 institutions with lower-grade gliomas from The Cancer Genome Atlas. Based on automatic deep learning segmentations, we extracted three features which quantify two-dimensional and three-dimensional characteristics of the tumors. Genomic data for the analyzed cohort of patients consisted of previously identified genomic clusters based on IDH mutation and 1p/19q co-deletion, DNA methylation, gene expression, DNA copy number, and microRNA expression. To analyze the relationship between the imaging features and genomic clusters, we conducted the Fisher exact test for 10 hypotheses for each pair of imaging feature and genomic subtype. To account for multiple hypothesis testing, we applied a Bonferroni correction. P-values lower than 0.005 were considered statistically significant. We found the strongest association between RNASeq clusters and the bounding ellipsoid volume ratio (p < 0.0002) and between RNASeq clusters and margin fluctuation (p < 0.005). In addition, we identified associations between bounding ellipsoid volume ratio and all tested molecular subtypes (p < 0.02) as well as between angular standard deviation and RNASeq cluster (p < 0.02). In terms of automatic tumor segmentation that was used to generate the quantitative image characteristics, our deep learning algorithm achieved a mean Dice coefficient of 82% which is comparable to human performance.


Subject(s)
Deep Learning , Genome, Human , Glioma , Image Processing, Computer-Assisted , Models, Biological , Base Sequence , Chromosome Deletion , Chromosomes, Human, Pair 1/genetics , Chromosomes, Human, Pair 1/metabolism , Chromosomes, Human, Pair 19/genetics , Chromosomes, Human, Pair 19/metabolism , DNA Methylation , DNA, Neoplasm/genetics , DNA, Neoplasm/metabolism , Gene Expression Regulation, Neoplastic , Glioma/diagnostic imaging , Glioma/genetics , Glioma/metabolism , Humans , MicroRNAs/biosynthesis , MicroRNAs/genetics , RNA, Neoplasm/biosynthesis , RNA, Neoplasm/genetics
11.
Neural Netw ; 106: 249-259, 2018 Oct.
Article in English | MEDLINE | ID: mdl-30092410

ABSTRACT

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.


Subject(s)
Machine Learning , Neural Networks, Computer , Humans , Machine Learning/trends , Probability , ROC Curve
SELECTION OF CITATIONS
SEARCH DETAIL