Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 290
Filter
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347140

ABSTRACT

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Subject(s)
Artificial Intelligence
2.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347141

ABSTRACT

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Machine Learning , Semantics
3.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38876123

ABSTRACT

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Subject(s)
Artificial Intelligence , Magnetic Resonance Imaging , Prostatic Neoplasms , Radiologists , Humans , Male , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Aged , Retrospective Studies , Middle Aged , Neoplasm Grading , Netherlands , ROC Curve
4.
Radiology ; 310(1): e230981, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38193833

ABSTRACT

Background Multiple commercial artificial intelligence (AI) products exist for assessing radiographs; however, comparable performance data for these algorithms are limited. Purpose To perform an independent, stand-alone validation of commercially available AI products for bone age prediction based on hand radiographs and lung nodule detection on chest radiographs. Materials and Methods This retrospective study was carried out as part of Project AIR. Nine of 17 eligible AI products were validated on data from seven Dutch hospitals. For bone age prediction, the root mean square error (RMSE) and Pearson correlation coefficient were computed. The reference standard was set by three to five expert readers. For lung nodule detection, the area under the receiver operating characteristic curve (AUC) was computed. The reference standard was set by a chest radiologist based on CT. Randomized subsets of hand (n = 95) and chest (n = 140) radiographs were read by 14 and 17 human readers, respectively, with varying experience. Results Two bone age prediction algorithms were tested on hand radiographs (from January 2017 to January 2022) in 326 patients (mean age, 10 years ± 4 [SD]; 173 female patients) and correlated strongly with the reference standard (r = 0.99; P < .001 for both). No difference in RMSE was observed between algorithms (0.63 years [95% CI: 0.58, 0.69] and 0.57 years [95% CI: 0.52, 0.61]) and readers (0.68 years [95% CI: 0.64, 0.73]). Seven lung nodule detection algorithms were validated on chest radiographs (from January 2012 to May 2022) in 386 patients (mean age, 64 years ± 11; 223 male patients). Compared with readers (mean AUC, 0.81 [95% CI: 0.77, 0.85]), four algorithms performed better (AUC range, 0.86-0.93; P value range, <.001 to .04). Conclusions Compared with human readers, four AI algorithms for detecting lung nodules on chest radiographs showed improved performance, whereas the remaining algorithms tested showed no evidence of a difference in performance. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Omoumi and Richiardi in this issue.


Subject(s)
Artificial Intelligence , Software , Humans , Female , Male , Child , Middle Aged , Retrospective Studies , Algorithms , Lung
5.
Clin Chem ; 2024 Jun 22.
Article in English | MEDLINE | ID: mdl-38906831

ABSTRACT

BACKGROUND: Hemoglobinopathies, the most common inherited blood disorder, are frequently underdiagnosed. Early identification of carriers is important for genetic counseling of couples at risk. The aim of this study was to develop and validate a novel machine learning model on a multicenter data set, covering a wide spectrum of hemoglobinopathies based on routine complete blood count (CBC) testing. METHODS: Hemoglobinopathy test results from 10 322 adults were extracted retrospectively from 8 Dutch laboratories. eXtreme Gradient Boosting (XGB) and logistic regression models were developed to differentiate negative from positive hemoglobinopathy cases, using 7 routine CBC parameters. External validation was conducted on a data set from an independent Dutch laboratory, with an additional external validation on a Spanish data set (n = 2629) specifically for differentiating thalassemia from iron deficiency anemia (IDA). RESULTS: The XGB and logistic regression models achieved an area under the receiver operating characteristic (AUROC) of 0.88 and 0.84, respectively, in distinguishing negative from positive hemoglobinopathy cases in the independent external validation set. Subclass analysis showed that the XGB model reached an AUROC of 0.97 for ß-thalassemia, 0.98 for α0-thalassemia, 0.95 for homozygous α+-thalassemia, 0.78 for heterozygous α+-thalassemia, and 0.94 for the structural hemoglobin variants Hemoglobin C, Hemoglobin D, Hemoglobin E. Both models attained AUROCs of 0.95 in differentiating IDA from thalassemia. CONCLUSIONS: Both the XGB and logistic regression model demonstrate high accuracy in predicting a broad range of hemoglobinopathies and are effective in differentiating hemoglobinopathies from IDA. Integration of these models into the laboratory information system facilitates automated hemoglobinopathy detection using routine CBC parameters.

6.
Eur Radiol ; 34(1): 348-354, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37515632

ABSTRACT

OBJECTIVES: To map the clinical use of CE-marked artificial intelligence (AI)-based software in radiology departments in the Netherlands (n = 69) between 2020 and 2022. MATERIALS AND METHODS: Our AI network (one radiologist or AI representative per Dutch hospital organization) received a questionnaire each spring from 2020 to 2022 about AI product usage, financing, and obstacles to adoption. Products that were not listed on www.AIforRadiology.com by July 2022 were excluded from the analysis. RESULTS: The number of respondents was 43 in 2020, 36 in 2021, and 33 in 2022. The number of departments using AI has been growing steadily (2020: 14, 2021: 19, 2022: 23). The diversity (2020: 7, 2021: 18, 2022: 34) and the number of total implementations (2020: 19, 2021: 38, 2022: 68) has rapidly increased. Seven implementations were discontinued in 2022. Four hospital organizations said to use an AI platform or marketplace for the deployment of AI solutions. AI is mostly used to support chest CT (17), neuro CT (17), and musculoskeletal radiograph (12) analysis. The budget for AI was reserved in 13 of the responding centers in both 2021 and 2022. The most important obstacles to the adoption of AI remained costs and IT integration. Of the respondents, 28% stated that the implemented AI products realized health improvement and 32% assumed both health improvement and cost savings. CONCLUSION: The adoption of AI products in radiology departments in the Netherlands is showing common signs of a developing market. The major obstacles to reaching widespread adoption are a lack of financial resources and IT integration difficulties. CLINICAL RELEVANCE STATEMENT: The clinical impact of AI starts with its adoption in daily clinical practice. Increased transparency around AI products being adopted, implementation obstacles, and impact may inspire increased collaboration and improved decision-making around the implementation and financing of AI products. KEY POINTS: • The adoption of artificial intelligence products for radiology has steadily increased since 2020 to at least a third of the centers using AI in clinical practice in the Netherlands in 2022. • The main areas in which artificial intelligence products are used are lung nodule detection on CT, aided stroke diagnosis, and bone age prediction. • The majority of respondents experienced added value (decreased costs and/or improved outcomes) from using artificial intelligence-based software; however, major obstacles to adoption remain the costs and IT-related difficulties.


Subject(s)
Artificial Intelligence , Radiology , Humans , Netherlands , Radiography , Radiologists
7.
Eur Radiol ; 2024 Feb 21.
Article in English | MEDLINE | ID: mdl-38383922

ABSTRACT

OBJECTIVES: Severity of degenerative scoliosis (DS) is assessed by measuring the Cobb angle on anteroposterior radiographs. However, MRI images are often available to study the degenerative spine. This retrospective study aims to develop and evaluate the reliability of a novel automatic method that measures coronal Cobb angles on lumbar MRI in DS patients. MATERIALS AND METHODS: Vertebrae and intervertebral discs were automatically segmented using a 3D AI algorithm, trained on 447 lumbar MRI series. The segmentations were used to calculate all possible angles between the vertebral endplates, with the largest being the Cobb angle. The results were validated with 50 high-resolution sagittal lumbar MRI scans of DS patients, in which three experienced readers measured the Cobb angle. Reliability was determined using the intraclass correlation coefficient (ICC). RESULTS: The ICCs between the readers ranged from 0.90 (95% CI 0.83-0.94) to 0.93 (95% CI 0.88-0.96). The ICC between the maximum angle found by the algorithm and the average manually measured Cobb angles was 0.83 (95% CI 0.71-0.90). In 9 out of the 50 cases (18%), all readers agreed on both vertebral levels for Cobb angle measurement. When using the algorithm to extract the angles at the vertebral levels chosen by the readers, the ICCs ranged from 0.92 (95% CI 0.87-0.96) to 0.97 (95% CI 0.94-0.98). CONCLUSION: The Cobb angle can be accurately measured on MRI using the newly developed algorithm in patients with DS. The readers failed to consistently choose the same vertebral level for Cobb angle measurement, whereas the automatic approach ensures the maximum angle is consistently measured. CLINICAL RELEVANCE STATEMENT: Our AI-based algorithm offers reliable Cobb angle measurement on routine MRI for degenerative scoliosis patients, potentially reducing the reliance on conventional radiographs, ensuring consistent assessments, and therefore improving patient care. KEY POINTS: • While often available, MRI images are rarely utilized to determine the severity of degenerative scoliosis. • The presented MRI Cobb angle algorithm is more reliable than humans in patients with degenerative scoliosis. • Radiographic imaging for Cobb angle measurements is mitigated when lumbar MRI images are available.

8.
Eur Radiol ; 2024 May 17.
Article in English | MEDLINE | ID: mdl-38758252

ABSTRACT

INTRODUCTION: This study investigates the performance of a commercially available artificial intelligence (AI) system to identify normal chest radiographs and its potential to reduce radiologist workload. METHODS: Retrospective analysis included consecutive chest radiographs from two medical centers between Oct 1, 2016 and Oct 14, 2016. Exclusions comprised follow-up exams within the inclusion period, bedside radiographs, incomplete images, imported radiographs, and pediatric radiographs. Three chest radiologists categorized findings into normal, clinically irrelevant, clinically relevant, urgent, and critical. A commercial AI system processed all radiographs, scoring 10 chest abnormalities on a 0-100 confidence scale. AI system performance was evaluated using the area under the ROC curve (AUC), assessing the detection of normal radiographs. Sensitivity was calculated for the default and a conservative operating point. the detection of negative predictive value (NPV) for urgent and critical findings, as well as the potential workload reduction, was calculated. RESULTS: A total of 2603 radiographs were acquired in 2141 unique patients. Post-exclusion, 1670 radiographs were analyzed. Categories included 479 normal, 332 clinically irrelevant, 339 clinically relevant, 501 urgent, and 19 critical findings. The AI system achieved an AUC of 0.92. Sensitivity for normal radiographs was 92% at default and 53% at the conservative operating point. At the conservative operating point, NPV was 98% for urgent and critical findings, and could result in a 15% workload reduction. CONCLUSION: A commercially available AI system effectively identifies normal chest radiographs and holds the potential to lessen radiologists' workload by omitting half of the normal exams from reporting. CLINICAL RELEVANCE STATEMENT: The AI system is able to detect half of all normal chest radiographs at a clinically acceptable operating point, thereby potentially reducing the workload for the radiologists by 15%. KEY POINTS: The AI system reached an AUC of 0.92 for the detection of normal chest radiographs. Fifty-three percent of normal chest radiographs were identified with a NPV of 98% for urgent findings. AI can reduce the workload of chest radiography reporting by 15%.

9.
Eur Radiol ; 2024 Apr 18.
Article in English | MEDLINE | ID: mdl-38634877

ABSTRACT

OBJECTIVES: To develop and validate an artificial intelligence (AI) system for measuring and detecting signs of carpal instability on conventional radiographs. MATERIALS AND METHODS: Two case-control datasets of hand and wrist radiographs were retrospectively acquired at three hospitals (hospitals A, B, and C). Dataset 1 (2178 radiographs from 1993 patients, hospitals A and B, 2018-2019) was used for developing an AI system for measuring scapholunate (SL) joint distances, SL and capitolunate (CL) angles, and carpal arc interruptions. Dataset 2 (481 radiographs from 217 patients, hospital C, 2017-2021) was used for testing, and with a subsample (174 radiographs from 87 patients), an observer study was conducted to compare its performance to five clinicians. Evaluation metrics included mean absolute error (MAE), sensitivity, and specificity. RESULTS: Dataset 2 included 258 SL distances, 189 SL angles, 191 CL angles, and 217 carpal arc labels obtained from 217 patients (mean age, 51 years ± 23 [standard deviation]; 133 women). The MAE in measuring SL distances, SL angles, and CL angles was respectively 0.65 mm (95%CI: 0.59, 0.72), 7.9 degrees (95%CI: 7.0, 8.9), and 5.9 degrees (95%CI: 5.2, 6.6). The sensitivity and specificity for detecting arc interruptions were 83% (95%CI: 74, 91) and 64% (95%CI: 56, 71). The measurements were largely comparable to those of the clinicians, while arc interruption detections were more accurate than those of most clinicians. CONCLUSION: This study demonstrates that a newly developed automated AI system accurately measures and detects signs of carpal instability on conventional radiographs. CLINICAL RELEVANCE STATEMENT: This system has the potential to improve detections of carpal arc interruptions and could be a promising tool for supporting clinicians in detecting carpal instability.

10.
BMC Oral Health ; 24(1): 387, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38532414

ABSTRACT

OBJECTIVE: Panoramic radiographs (PRs) provide a comprehensive view of the oral and maxillofacial region and are used routinely to assess dental and osseous pathologies. Artificial intelligence (AI) can be used to improve the diagnostic accuracy of PRs compared to bitewings and periapical radiographs. This study aimed to evaluate the advantages and challenges of using publicly available datasets in dental AI research, focusing on solving the novel task of predicting tooth segmentations, FDI numbers, and tooth diagnoses, simultaneously. MATERIALS AND METHODS: Datasets from the OdontoAI platform (tooth instance segmentations) and the DENTEX challenge (tooth bounding boxes with associated diagnoses) were combined to develop a two-stage AI model. The first stage implemented tooth instance segmentation with FDI numbering and extracted regions of interest around each tooth segmentation, whereafter the second stage implemented multi-label classification to detect dental caries, impacted teeth, and periapical lesions in PRs. The performance of the automated tooth segmentation algorithm was evaluated using a free-response receiver-operating-characteristics (FROC) curve and mean average precision (mAP) metrics. The diagnostic accuracy of detection and classification of dental pathology was evaluated with ROC curves and F1 and AUC metrics. RESULTS: The two-stage AI model achieved high accuracy in tooth segmentations with a FROC score of 0.988 and a mAP of 0.848. High accuracy was also achieved in the diagnostic classification of impacted teeth (F1 = 0.901, AUC = 0.996), whereas moderate accuracy was achieved in the diagnostic classification of deep caries (F1 = 0.683, AUC = 0.960), early caries (F1 = 0.662, AUC = 0.881), and periapical lesions (F1 = 0.603, AUC = 0.974). The model's performance correlated positively with the quality of annotations in the used public datasets. Selected samples from the DENTEX dataset revealed cases of missing (false-negative) and incorrect (false-positive) diagnoses, which negatively influenced the performance of the AI model. CONCLUSIONS: The use and pooling of public datasets in dental AI research can significantly accelerate the development of new AI models and enable fast exploration of novel tasks. However, standardized quality assurance is essential before using the datasets to ensure reliable outcomes and limit potential biases.


Subject(s)
Dental Caries , Tooth, Impacted , Tooth , Humans , Artificial Intelligence , Radiography, Panoramic , Bone and Bones
11.
Radiology ; 308(2): e223308, 2023 08.
Article in English | MEDLINE | ID: mdl-37526548

ABSTRACT

Background Prior chest CT provides valuable temporal information (eg, changes in nodule size or appearance) to accurately estimate malignancy risk. Purpose To develop a deep learning (DL) algorithm that uses a current and prior low-dose CT examination to estimate 3-year malignancy risk of pulmonary nodules. Materials and Methods In this retrospective study, the algorithm was trained using National Lung Screening Trial data (collected from 2002 to 2004), wherein patients were imaged at most 2 years apart, and evaluated with two external test sets from the Danish Lung Cancer Screening Trial (DLCST) and the Multicentric Italian Lung Detection Trial (MILD), collected in 2004-2010 and 2005-2014, respectively. Performance was evaluated using area under the receiver operating characteristic curve (AUC) on cancer-enriched subsets with size-matched benign nodules imaged 1 and 2 years apart from DLCST and MILD, respectively. The algorithm was compared with a validated DL algorithm that only processed a single CT examination and the Pan-Canadian Early Lung Cancer Detection Study (PanCan) model. Results The training set included 10 508 nodules (422 malignant) in 4902 trial participants (mean age, 64 years ± 5 [SD]; 2778 men). The size-matched external test sets included 129 nodules (43 malignant) and 126 nodules (42 malignant). The algorithm achieved AUCs of 0.91 (95% CI: 0.85, 0.97) and 0.94 (95% CI: 0.89, 0.98). It significantly outperformed the DL algorithm that only processed a single CT examination (AUC, 0.85 [95% CI: 0.78, 0.92; P = .002]; and AUC, 0.89 [95% CI: 0.84, 0.95; P = .01]) and the PanCan model (AUC, 0.64 [95% CI: 0.53, 0.74; P < .001]; and AUC, 0.63 [95% CI: 0.52, 0.74; P < .001]). Conclusion A DL algorithm using current and prior low-dose CT examinations was more effective at estimating 3-year malignancy risk of pulmonary nodules than established models that only use a single CT examination. Clinical trial registration nos. NCT00047385, NCT00496977, NCT02837809 © RSNA, 2023 Supplemental material is available for this article. See also the editorial by Horst and Nishino in this issue.


Subject(s)
Deep Learning , Lung Neoplasms , Multiple Pulmonary Nodules , Male , Humans , Middle Aged , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/pathology , Retrospective Studies , Early Detection of Cancer , Canada , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/pathology , Tomography, X-Ray Computed/methods
12.
Eur Radiol ; 33(11): 8279-8288, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37338552

ABSTRACT

OBJECTIVE: To study trends in the incidence of reported pulmonary nodules and stage I lung cancer in chest CT. METHODS: We analyzed the trends in the incidence of detected pulmonary nodules and stage I lung cancer in chest CT scans in the period between 2008 and 2019. Imaging metadata and radiology reports from all chest CT studies were collected from two large Dutch hospitals. A natural language processing algorithm was developed to identify studies with any reported pulmonary nodule. RESULTS: Between 2008 and 2019, a total of 74,803 patients underwent 166,688 chest CT examinations at both hospitals combined. During this period, the annual number of chest CT scans increased from 9955 scans in 6845 patients in 2008 to 20,476 scans in 13,286 patients in 2019. The proportion of patients in whom nodules (old or new) were reported increased from 38% (2595/6845) in 2008 to 50% (6654/13,286) in 2019. The proportion of patients in whom significant new nodules (≥ 5 mm) were reported increased from 9% (608/6954) in 2010 to 17% (1660/9883) in 2017. The number of patients with new nodules and corresponding stage I lung cancer diagnosis tripled and their proportion doubled, from 0.4% (26/6954) in 2010 to 0.8% (78/9883) in 2017. CONCLUSION: The identification of incidental pulmonary nodules in chest CT has steadily increased over the past decade and has been accompanied by more stage I lung cancer diagnoses. CLINICAL RELEVANCE STATEMENT: These findings stress the importance of identifying and efficiently managing incidental pulmonary nodules in routine clinical practice. KEY POINTS: • The number of patients who underwent chest CT examinations substantially increased over the past decade, as did the number of patients in whom pulmonary nodules were identified. • The increased use of chest CT and more frequently identified pulmonary nodules were associated with more stage I lung cancer diagnoses.


Subject(s)
Lung Neoplasms , Multiple Pulmonary Nodules , Solitary Pulmonary Nodule , Humans , Incidence , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/epidemiology , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/epidemiology , Tomography, X-Ray Computed/methods , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/epidemiology
13.
Eur Radiol ; 33(3): 1575-1588, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36380195

ABSTRACT

OBJECTIVES: To assess how an artificial intelligence (AI) algorithm performs against five experienced musculoskeletal radiologists in diagnosing scaphoid fractures and whether it aids their diagnosis on conventional multi-view radiographs. METHODS: Four datasets of conventional hand, wrist, and scaphoid radiographs were retrospectively acquired at two hospitals (hospitals A and B). Dataset 1 (12,990 radiographs from 3353 patients, hospital A) and dataset 2 (1117 radiographs from 394 patients, hospital B) were used for training and testing a scaphoid localization and laterality classification component. Dataset 3 (4316 radiographs from 840 patients, hospital A) and dataset 4 (688 radiographs from 209 patients, hospital B) were used for training and testing the fracture detector. The algorithm was compared with the radiologists in an observer study. Evaluation metrics included sensitivity, specificity, positive predictive value (PPV), area under the characteristic operating curve (AUC), Cohen's kappa coefficient (κ), fracture localization precision, and reading time. RESULTS: The algorithm detected scaphoid fractures with a sensitivity of 72%, specificity of 93%, PPV of 81%, and AUC of 0.88. The AUC of the algorithm did not differ from each radiologist (0.87 [radiologists' mean], p ≥ .05). AI assistance improved five out of ten pairs of inter-observer Cohen's κ agreements (p < .05) and reduced reading time in four radiologists (p < .001), but did not improve other metrics in the majority of radiologists (p ≥ .05). CONCLUSIONS: The AI algorithm detects scaphoid fractures on conventional multi-view radiographs at the level of five experienced musculoskeletal radiologists and could significantly shorten their reading time. KEY POINTS: • An artificial intelligence algorithm automatically detects scaphoid fractures on conventional multi-view radiographs at the same level of five experienced musculoskeletal radiologists. • There is preliminary evidence that automated scaphoid fracture detection can significantly shorten the reading time of musculoskeletal radiologists.


Subject(s)
Deep Learning , Fractures, Bone , Scaphoid Bone , Wrist Injuries , Humans , Fractures, Bone/diagnostic imaging , Wrist , Retrospective Studies , Artificial Intelligence , Scaphoid Bone/diagnostic imaging , Radiologists
14.
BMC Oral Health ; 23(1): 643, 2023 09 05.
Article in English | MEDLINE | ID: mdl-37670290

ABSTRACT

OBJECTIVE: Intra-oral scans and gypsum cast scans (OS) are widely used in orthodontics, prosthetics, implantology, and orthognathic surgery to plan patient-specific treatments, which require teeth segmentations with high accuracy and resolution. Manual teeth segmentation, the gold standard up until now, is time-consuming, tedious, and observer-dependent. This study aims to develop an automated teeth segmentation and labeling system using deep learning. MATERIAL AND METHODS: As a reference, 1750 OS were manually segmented and labeled. A deep-learning approach based on PointCNN and 3D U-net in combination with a rule-based heuristic algorithm and a combinatorial search algorithm was trained and validated on 1400 OS. Subsequently, the trained algorithm was applied to a test set consisting of 350 OS. The intersection over union (IoU), as a measure of accuracy, was calculated to quantify the degree of similarity between the annotated ground truth and the model predictions. RESULTS: The model achieved accurate teeth segmentations with a mean IoU score of 0.915. The FDI labels of the teeth were predicted with a mean accuracy of 0.894. The optical inspection showed excellent position agreements between the automatically and manually segmented teeth components. Minor flaws were mostly seen at the edges. CONCLUSION: The proposed method forms a promising foundation for time-effective and observer-independent teeth segmentation and labeling on intra-oral scans. CLINICAL SIGNIFICANCE: Deep learning may assist clinicians in virtual treatment planning in orthodontics, prosthetics, implantology, and orthognathic surgery. The impact of using such models in clinical practice should be explored.


Subject(s)
Deep Learning , Humans , Algorithms , Calcium Sulfate , Dental Care , Physical Examination
15.
Eur Respir J ; 59(5)2022 05.
Article in English | MEDLINE | ID: mdl-34649976

ABSTRACT

BACKGROUND: A baseline computed tomography (CT) scan for lung cancer (LC) screening may reveal information indicating that certain LC screening participants can be screened less, and instead require dedicated early cardiac and respiratory clinical input. We aimed to develop and validate competing death (CD) risk models using CT information to identify participants with a low LC risk and a high CD risk. METHODS: Participant demographics and quantitative CT measures of LC, cardiovascular disease and chronic obstructive pulmonary disease were considered for deriving a logistic regression model for predicting 5-year CD risk using a sample from the National Lung Screening Trial (n=15 000). Multicentric Italian Lung Detection data were used to perform external validation (n=2287). RESULTS: Our final CD model outperformed an external pre-scan model (CD Risk Assessment Tool) in both the derivation (area under the curve (AUC) 0.744 (95% CI 0.727-0.761) and 0.677 (95% CI 0.658-0.695), respectively) and validation cohorts (AUC 0.744 (95% CI 0.652-0.835) and 0.725 (95% CI 0.633-0.816), respectively). By also taking LC incidence risk into consideration, we suggested a risk threshold where a subgroup (6258/23 096 (27%)) was identified with a number needed to screen to detect one LC of 216 (versus 23 in the remainder of the cohort) and ratio of 5.41 CDs per LC case (versus 0.88). The respective values in the validation cohort subgroup (774/2287 (34%)) were 129 (versus 29) and 1.67 (versus 0.43). CONCLUSIONS: Evaluating both LC and CD risks post-scan may improve the efficiency of LC screening and facilitate the initiation of multidisciplinary trajectories among certain participants.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Early Detection of Cancer/methods , Humans , Lung , Lung Neoplasms/diagnosis , Mass Screening , Risk Assessment/methods , Tomography, X-Ray Computed/methods
16.
Rheumatology (Oxford) ; 61(7): 2867-2874, 2022 07 06.
Article in English | MEDLINE | ID: mdl-34791065

ABSTRACT

OBJECTIVES: Earlier retrospective studies have suggested a relation between DISH and cardiovascular disease, including myocardial infarction. The present study assessed the association between DISH and incidence of cardiovascular events and mortality in patients with high cardiovascular risk. METHODS: In this prospective cohort study, we included 4624 patients (mean age 58.4 years, 69.6% male) from the Second Manifestations of ARTerial disease cohort. The main end point was major cardiovascular events (MACE: stroke, myocardial infarction and vascular death). Secondary endpoints included all-cause mortality and separate vascular events. Cause-specific proportional hazard models were used to evaluate the risk of DISH on all outcomes, and subdistribution hazard models were used to evaluate the effect of DISH on the cumulative incidence. All models were adjusted for age, sex, body mass index, blood pressure, diabetes, non-HDL cholesterol, packyears, renal function and C-reactive protein. RESULTS: DISH was present in 435 (9.4%) patients. After a median follow-up of 8.7 (IQR 5.0-12.0) years, 864 patients had died and 728 patients developed a MACE event. DISH was associated with an increased cumulative incidence of ischaemic stroke. After adjustment in cause-specific modelling, DISH remained significantly associated with ischaemic stroke (HR 1.55; 95% CI: 1.01, 2.38), but not with MACE (HR 0.99; 95% CI: 0.79, 1.24), myocardial infarction (HR 0.88; 95% CI: 0.59, 1.31), vascular death (HR 0.94; 95% CI: 0.68, 1.27) or all-cause mortality (HR 0.94; 95% CI: 0.77, 1.16). CONCLUSION: The presence of DISH is independently associated with an increased incidence and risk for ischaemic stroke, but not with MACE, myocardial infarction, vascular death or all-cause mortality.


Subject(s)
Brain Ischemia , Cardiovascular Diseases , Hyperostosis, Diffuse Idiopathic Skeletal , Ischemic Stroke , Myocardial Infarction , Stroke , Brain Ischemia/complications , Cardiovascular Diseases/complications , Cardiovascular Diseases/etiology , Female , Heart Disease Risk Factors , Humans , Hyperostosis, Diffuse Idiopathic Skeletal/complications , Male , Middle Aged , Myocardial Infarction/complications , Myocardial Infarction/etiology , Prospective Studies , Retrospective Studies , Risk Factors , Stroke/complications , Stroke/etiology
17.
Pediatr Radiol ; 52(11): 2087-2093, 2022 10.
Article in English | MEDLINE | ID: mdl-34117522

ABSTRACT

Since the introduction of artificial intelligence (AI) in radiology, the promise has been that it will improve health care and reduce costs. Has AI been able to fulfill that promise? We describe six clinical objectives that can be supported by AI: a more efficient workflow, shortened reading time, a reduction of dose and contrast agents, earlier detection of disease, improved diagnostic accuracy and more personalized diagnostics. We provide examples of use cases including the available scientific evidence for its impact based on a hierarchical model of efficacy. We conclude that the market is still maturing and little is known about the contribution of AI to clinical practice. More real-world monitoring of AI in clinical practice is expected to aid in determining the value of AI and making informed decisions on development, procurement and reimbursement.


Subject(s)
Artificial Intelligence , Radiology , Contrast Media , Humans , Outcome Assessment, Health Care , Radiography
18.
Radiology ; 300(2): 438-447, 2021 08.
Article in English | MEDLINE | ID: mdl-34003056

ABSTRACT

Background Accurate estimation of the malignancy risk of pulmonary nodules at chest CT is crucial for optimizing management in lung cancer screening. Purpose To develop and validate a deep learning (DL) algorithm for malignancy risk estimation of pulmonary nodules detected at screening CT. Materials and Methods In this retrospective study, the DL algorithm was developed with 16 077 nodules (1249 malignant) collected -between 2002 and 2004 from the National Lung Screening Trial. External validation was performed in the following three -cohorts -collected between 2004 and 2010 from the Danish Lung Cancer Screening Trial: a full cohort containing all 883 nodules (65 -malignant) and two cancer-enriched cohorts with size matching (175 nodules, 59 malignant) and without size matching (177 -nodules, 59 malignant) of benign nodules selected at random. Algorithm performance was measured by using the area under the receiver operating characteristic curve (AUC) and compared with that of the Pan-Canadian Early Detection of Lung Cancer (PanCan) model in the full cohort and a group of 11 clinicians composed of four thoracic radiologists, five radiology residents, and two pulmonologists in the cancer-enriched cohorts. Results The DL algorithm significantly outperformed the PanCan model in the full cohort (AUC, 0.93 [95% CI: 0.89, 0.96] vs 0.90 [95% CI: 0.86, 0.93]; P = .046). The algorithm performed comparably to thoracic radiologists in cancer-enriched cohorts with both random benign nodules (AUC, 0.96 [95% CI: 0.93, 0.99] vs 0.90 [95% CI: 0.81, 0.98]; P = .11) and size-matched benign nodules (AUC, 0.86 [95% CI: 0.80, 0.91] vs 0.82 [95% CI: 0.74, 0.89]; P = .26). Conclusion The deep learning algorithm showed excellent performance, comparable to thoracic radiologists, for malignancy risk estimation of pulmonary nodules detected at screening CT. This algorithm has the potential to provide reliable and reproducible malignancy risk scores for clinicians, which may help optimize management in lung cancer screening. © RSNA, 2021 Online supplemental material is available for this article. See also the editorial by Tammemägi in this issue.


Subject(s)
Deep Learning , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Humans , Lung Neoplasms/pathology , Mass Screening , Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/pathology , Radiation Dosage , Retrospective Studies , Risk Assessment , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/pathology
19.
Radiology ; 298(1): E18-E28, 2021 01.
Article in English | MEDLINE | ID: mdl-32729810

ABSTRACT

Background The coronavirus disease 2019 (COVID-19) pandemic has spread across the globe with alarming speed, morbidity, and mortality. Immediate triage of patients with chest infections suspected to be caused by COVID-19 using chest CT may be of assistance when results from definitive viral testing are delayed. Purpose To develop and validate an artificial intelligence (AI) system to score the likelihood and extent of pulmonary COVID-19 on chest CT scans using the COVID-19 Reporting and Data System (CO-RADS) and CT severity scoring systems. Materials and Methods The CO-RADS AI system consists of three deep-learning algorithms that automatically segment the five pulmonary lobes, assign a CO-RADS score for the suspicion of COVID-19, and assign a CT severity score for the degree of parenchymal involvement per lobe. This study retrospectively included patients who underwent a nonenhanced chest CT examination because of clinical suspicion of COVID-19 at two medical centers. The system was trained, validated, and tested with data from one of the centers. Data from the second center served as an external test set. Diagnostic performance and agreement with scores assigned by eight independent observers were measured using receiver operating characteristic analysis, linearly weighted κ values, and classification accuracy. Results A total of 105 patients (mean age, 62 years ± 16 [standard deviation]; 61 men) and 262 patients (mean age, 64 years ± 16; 154 men) were evaluated in the internal and external test sets, respectively. The system discriminated between patients with COVID-19 and those without COVID-19, with areas under the receiver operating characteristic curve of 0.95 (95% CI: 0.91, 0.98) and 0.88 (95% CI: 0.84, 0.93), for the internal and external test sets, respectively. Agreement with the eight human observers was moderate to substantial, with mean linearly weighted κ values of 0.60 ± 0.01 for CO-RADS scores and 0.54 ± 0.01 for CT severity scores. Conclusion With high diagnostic performance, the CO-RADS AI system correctly identified patients with COVID-19 using chest CT scans and assigned standardized CO-RADS and CT severity scores that demonstrated good agreement with findings from eight independent observers and generalized well to external data. © RSNA, 2020 Supplemental material is available for this article.


Subject(s)
Artificial Intelligence , COVID-19/diagnostic imaging , Severity of Illness Index , Thorax/diagnostic imaging , Tomography, X-Ray Computed , Aged , Data Systems , Female , Humans , Male , Middle Aged , Research Design , Retrospective Studies
20.
Eur Respir J ; 58(3)2021 09.
Article in English | MEDLINE | ID: mdl-33574075

ABSTRACT

OBJECTIVES: Combined assessment of cardiovascular disease (CVD), COPD and lung cancer may improve the effectiveness of lung cancer screening in smokers. The aims were to derive and assess risk models for predicting lung cancer incidence, CVD mortality and COPD mortality by combining quantitative computed tomography (CT) measures from each disease, and to quantify the added predictive benefit of self-reported patient characteristics given the availability of a CT scan. METHODS: A survey model (patient characteristics only), CT model (CT information only) and final model (all variables) were derived for each outcome using parsimonious Cox regression on a sample from the National Lung Screening Trial (n=15 000). Validation was performed using Multicentric Italian Lung Detection data (n=2287). Time-dependent measures of model discrimination and calibration are reported. RESULTS: Age, mean lung density, emphysema score, bronchial wall thickness and aorta calcium volume are variables that contributed to all final models. Nodule features were crucial for lung cancer incidence predictions but did not contribute to CVD and COPD mortality prediction. In the derivation cohort, the lung cancer incidence CT model had a 5-year area under the receiver operating characteristic curve of 82.5% (95% CI 80.9-84.0%), significantly inferior to that of the final model (84.0%, 82.6-85.5%). However, the addition of patient characteristics did not improve the lung cancer incidence model performance in the validation cohort (CT model 80.1%, 74.2-86.0%; final model 79.9%, 73.9-85.8%). Similarly, the final CVD mortality model outperformed the other two models in the derivation cohort (survey model 74.9%, 72.7-77.1%; CT model 76.3%, 74.1-78.5%; final model 79.1%, 77.0-81.2%), but not the validation cohort (survey model 74.8%, 62.2-87.5%; CT model 72.1%, 61.1-83.2%; final model 72.2%, 60.4-84.0%). Combining patient characteristics and CT measures provided the largest increase in accuracy for the COPD mortality final model (92.3%, 90.1-94.5%) compared to either other model individually (survey model 87.5%, 84.3-90.6%; CT model 87.9%, 84.8-91.0%), but no external validation was performed due to a very low event frequency. CONCLUSIONS: CT measures of CVD and COPD provides small but reproducible improvements to nodule-based lung cancer risk prediction accuracy from 3 years onwards. Self-reported patient characteristics may not be of added predictive value when CT information is available.


Subject(s)
Early Detection of Cancer , Lung Neoplasms , Biomarkers , Humans , Lung/diagnostic imaging , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed
SELECTION OF CITATIONS
SEARCH DETAIL