Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 292
Filter
1.
Eur Radiol ; 2024 Sep 20.
Article in English | MEDLINE | ID: mdl-39299953

ABSTRACT

OBJECTIVES: The assessment of lumbar central canal stenosis (LCCS) is crucial for diagnosing and planning treatment for patients with low back pain and neurogenic pain. However, manual assessment methods are time-consuming, variable, and require axial MRIs. The aim of this study is to develop and validate an AI-based model that automatically classifies LCCS using sagittal T2-weighted MRIs. METHODS: A pre-existing 3D AI algorithm was utilized to segment the spinal canal and intervertebral discs (IVDs), enabling quantitative measurements at each IVD level. Four musculoskeletal radiologists graded 683 IVD levels from 186 LCCS patients using the 4-class Lee grading system. A second consensus reading was conducted by readers 1 and 2, which, along with automatic measurements, formed the training dataset for a multiclass (grade 0-3) and binary (grade 0-1 vs. 2-3) random forest classifier with tenfold cross-validation. RESULTS: The multiclass model achieved a Cohen's weighted kappa of 0.86 (95% CI: 0.82-0.90), comparable to readers 3 and 4 with 0.85 (95% CI: 0.80-0.89) and 0.73 (95% CI: 0.68-0.79) respectively. The binary model demonstrated an AUC of 0.98 (95% CI: 0.97-0.99), sensitivity of 93% (95% CI: 91-96%), and specificity of 91% (95% CI: 87-95%). In comparison, readers 3 and 4 achieved a specificity of 98 and 99% and sensitivity of 74 and 54%, respectively. CONCLUSION: Both the multiclass and binary models, while only using sagittal MR images, perform on par with experienced radiologists who also had access to axial sequences. This underscores the potential of this novel algorithm in enhancing diagnostic accuracy and efficiency in medical imaging. KEY POINTS: Question How can the classification of lumbar central canal stenosis (LCCS) be made more efficient? Findings Multiclass and binary AI models, using only sagittal MR images, performed on par with experienced radiologists who also had access to axial sequences. Clinical relevance Our AI algorithm accurately classifies LCCS from sagittal MRI, matching experienced radiologists. This study offers a promising tool for automated LCCS assessment from sagittal T2 MRI, potentially reducing the reliance on additional axial imaging.

2.
Clin Infect Dis ; 2024 Aug 27.
Article in English | MEDLINE | ID: mdl-39190813

ABSTRACT

BACKGROUND: To improve tuberculosis case-finding, rapid, non-sputum triage tests need to be developed according to the World Health Organization target product profile (TPP) (>90% sensitivity, >70% specificity). We prospectively evaluated and compared artificial intelligence-based, computer-aided detection software, CAD4TBv7, and C-reactive protein assay (CRP) as triage tests at health facilities in Lesotho and South Africa. METHODS: Adults (≥18 years) presenting with ≥1 of the 4 cardinal tuberculosis symptoms were consecutively recruited between February 2021 and April 2022. After informed consent, each participant underwent a digital chest X-ray for CAD4TBv7 and a CRP test. Participants provided 1 sputum sample for Xpert MTB/RIF Ultra and Xpert MTB/RIF and 1 for liquid culture. Additionally, an expert radiologist read the chest X-rays via teleradiology. For primary analysis, a composite microbiological reference standard (ie, positive culture or Xpert Ultra) was used. RESULTS: We enrolled 1392 participants, 48% were people with HIV and 24% had previously tuberculosis. The receiver operating characteristic curve for CAD4TBv7 and CRP showed an area under the curve of .87 (95% CI: .84-.91) and .80 (95% CI: .76-.84), respectively. At thresholds corresponding to 90% sensitivity, specificity was 68.2% (95% CI: 65.4-71.0%) and 38.2% (95% CI: 35.3-41.1%) for CAD4TBv7 and CRP, respectively. CAD4TBv7 detected tuberculosis as well as an expert radiologist. CAD4TBv7 almost met the TPP criteria for tuberculosis triage. CONCLUSIONS: CAD4TBv7 is accurate as a triage test for patients with tuberculosis symptoms from areas with a high tuberculosis and HIV burden. The role of CRP in tuberculosis triage requires further research. CLINICAL TRIALS REGISTRATION: Clinicaltrials.gov identifier: NCT04666311.

3.
Med Image Anal ; 97: 103286, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39111266

ABSTRACT

We present a novel graph-based approach for labeling the anatomical branches of a given airway tree segmentation. The proposed method formulates airway labeling as a branch classification problem in the airway tree graph, where branch features are extracted using convolutional neural networks and enriched using graph neural networks. Our graph neural network is structure-aware by having each node aggregate information from its local neighbors and position-aware by encoding node positions in the graph. We evaluated the proposed method on 220 airway trees from subjects with various severity stages of Chronic Obstructive Pulmonary Disease (COPD). The results demonstrate that our approach is computationally efficient and significantly improves branch classification performance than the baseline method. The overall average accuracy of our method reaches 91.18% for labeling 18 segmental airway branches, compared to 83.83% obtained by the standard CNN method and 87.37% obtained by the existing method. Furthermore, the reader study done on an additional set of 40 subjects shows that our algorithm performs comparably to human experts in labeling segmental-airways. We published our source code at https://github.com/DIAGNijmegen/spgnn. The proposed algorithm is also publicly available at https://grand-challenge.org/algorithms/airway-anatomical-labeling/.


Subject(s)
Algorithms , Neural Networks, Computer , Pulmonary Disease, Chronic Obstructive , Humans , Pulmonary Disease, Chronic Obstructive/diagnostic imaging , Tomography, X-Ray Computed/methods
4.
Med Image Anal ; 97: 103259, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38959721

ABSTRACT

Deep learning classification models for medical image analysis often perform well on data from scanners that were used to acquire the training data. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to enhance the reliability of deep learning classification models using a novel method called Uncertainty-Based Instance eXclusion (UBIX). UBIX is an inference-time module that can be employed in multiple-instance learning (MIL) settings. MIL is a paradigm in which instances (generally crops or slices) of a bag (generally an image) contribute towards a bag-level output. Instead of assuming equal contribution of all instances to the bag-level output, UBIX detects instances corrupted due to local artifacts on-the-fly using uncertainty estimation, reducing or fully ignoring their contributions before MIL pooling. In our experiments, instances are 2D slices and bags are volumetric images, but alternative definitions are also possible. Although UBIX is generally applicable to diverse classification tasks, we focused on the staging of age-related macular degeneration in optical coherence tomography. Our models were trained on data from a single scanner and tested on external datasets from different vendors, which included vendor-specific artifacts. UBIX showed reliable behavior, with a slight decrease in performance (a decrease of the quadratic weighted kappa (κw) from 0.861 to 0.708), when applied to images from different vendors containing artifacts; while a state-of-the-art 3D neural network without UBIX suffered from a significant detriment of performance (κw from 0.852 to 0.084) on the same test set. We showed that instances with unseen artifacts can be identified with OOD detection. UBIX can reduce their contribution to the bag-level predictions, improving reliability without retraining on new data. This potentially increases the applicability of artificial intelligence models to data from other scanners than the ones for which they were developed. The source code for UBIX, including trained model weights, is publicly available through https://github.com/qurAI-amsterdam/ubix-for-reliable-classification.


Subject(s)
Deep Learning , Tomography, Optical Coherence , Tomography, Optical Coherence/methods , Humans , Uncertainty , Reproducibility of Results , Artifacts , Image Processing, Computer-Assisted/methods , Macular Degeneration/diagnostic imaging , Algorithms
5.
Lancet Oncol ; 25(7): 879-887, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38876123

ABSTRACT

BACKGROUND: Artificial intelligence (AI) systems can potentially aid the diagnostic pathway of prostate cancer by alleviating the increasing workload, preventing overdiagnosis, and reducing the dependence on experienced radiologists. We aimed to investigate the performance of AI systems at detecting clinically significant prostate cancer on MRI in comparison with radiologists using the Prostate Imaging-Reporting and Data System version 2.1 (PI-RADS 2.1) and the standard of care in multidisciplinary routine practice at scale. METHODS: In this international, paired, non-inferiority, confirmatory study, we trained and externally validated an AI system (developed within an international consortium) for detecting Gleason grade group 2 or greater cancers using a retrospective cohort of 10 207 MRI examinations from 9129 patients. Of these examinations, 9207 cases from three centres (11 sites) based in the Netherlands were used for training and tuning, and 1000 cases from four centres (12 sites) based in the Netherlands and Norway were used for testing. In parallel, we facilitated a multireader, multicase observer study with 62 radiologists (45 centres in 20 countries; median 7 [IQR 5-10] years of experience in reading prostate MRI) using PI-RADS (2.1) on 400 paired MRI examinations from the testing cohort. Primary endpoints were the sensitivity, specificity, and the area under the receiver operating characteristic curve (AUROC) of the AI system in comparison with that of all readers using PI-RADS (2.1) and in comparison with that of the historical radiology readings made during multidisciplinary routine practice (ie, the standard of care with the aid of patient history and peer consultation). Histopathology and at least 3 years (median 5 [IQR 4-6] years) of follow-up were used to establish the reference standard. The statistical analysis plan was prespecified with a primary hypothesis of non-inferiority (considering a margin of 0·05) and a secondary hypothesis of superiority towards the AI system, if non-inferiority was confirmed. This study was registered at ClinicalTrials.gov, NCT05489341. FINDINGS: Of the 10 207 examinations included from Jan 1, 2012, through Dec 31, 2021, 2440 cases had histologically confirmed Gleason grade group 2 or greater prostate cancer. In the subset of 400 testing cases in which the AI system was compared with the radiologists participating in the reader study, the AI system showed a statistically superior and non-inferior AUROC of 0·91 (95% CI 0·87-0·94; p<0·0001), in comparison to the pool of 62 radiologists with an AUROC of 0·86 (0·83-0·89), with a lower boundary of the two-sided 95% Wald CI for the difference in AUROC of 0·02. At the mean PI-RADS 3 or greater operating point of all readers, the AI system detected 6·8% more cases with Gleason grade group 2 or greater cancers at the same specificity (57·7%, 95% CI 51·6-63·3), or 50·4% fewer false-positive results and 20·0% fewer cases with Gleason grade group 1 cancers at the same sensitivity (89·4%, 95% CI 85·3-92·9). In all 1000 testing cases where the AI system was compared with the radiology readings made during multidisciplinary practice, non-inferiority was not confirmed, as the AI system showed lower specificity (68·9% [95% CI 65·3-72·4] vs 69·0% [65·5-72·5]) at the same sensitivity (96·1%, 94·0-98·2) as the PI-RADS 3 or greater operating point. The lower boundary of the two-sided 95% Wald CI for the difference in specificity (-0·04) was greater than the non-inferiority margin (-0·05) and a p value below the significance threshold was reached (p<0·001). INTERPRETATION: An AI system was superior to radiologists using PI-RADS (2.1), on average, at detecting clinically significant prostate cancer and comparable to the standard of care. Such a system shows the potential to be a supportive tool within a primary diagnostic setting, with several associated benefits for patients and radiologists. Prospective validation is needed to test clinical applicability of this system. FUNDING: Health~Holland and EU Horizon 2020.


Subject(s)
Artificial Intelligence , Magnetic Resonance Imaging , Prostatic Neoplasms , Radiologists , Humans , Male , Prostatic Neoplasms/diagnostic imaging , Prostatic Neoplasms/pathology , Aged , Retrospective Studies , Middle Aged , Neoplasm Grading , Netherlands , ROC Curve
6.
Med Image Anal ; 97: 103230, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38875741

ABSTRACT

Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.


Subject(s)
COVID-19 , SARS-CoV-2 , Tomography, X-Ray Computed , Humans , Artificial Intelligence
7.
Clin Chem ; 70(8): 1064-1075, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-38906831

ABSTRACT

BACKGROUND: Hemoglobinopathies, the most common inherited blood disorder, are frequently underdiagnosed. Early identification of carriers is important for genetic counseling of couples at risk. The aim of this study was to develop and validate a novel machine learning model on a multicenter data set, covering a wide spectrum of hemoglobinopathies based on routine complete blood count (CBC) testing. METHODS: Hemoglobinopathy test results from 10 322 adults were extracted retrospectively from 8 Dutch laboratories. eXtreme Gradient Boosting (XGB) and logistic regression models were developed to differentiate negative from positive hemoglobinopathy cases, using 7 routine CBC parameters. External validation was conducted on a data set from an independent Dutch laboratory, with an additional external validation on a Spanish data set (n = 2629) specifically for differentiating thalassemia from iron deficiency anemia (IDA). RESULTS: The XGB and logistic regression models achieved an area under the receiver operating characteristic (AUROC) of 0.88 and 0.84, respectively, in distinguishing negative from positive hemoglobinopathy cases in the independent external validation set. Subclass analysis showed that the XGB model reached an AUROC of 0.97 for ß-thalassemia, 0.98 for α0-thalassemia, 0.95 for homozygous α+-thalassemia, 0.78 for heterozygous α+-thalassemia, and 0.94 for the structural hemoglobin variants Hemoglobin C, Hemoglobin D, Hemoglobin E. Both models attained AUROCs of 0.95 in differentiating IDA from thalassemia. CONCLUSIONS: Both the XGB and logistic regression model demonstrate high accuracy in predicting a broad range of hemoglobinopathies and are effective in differentiating hemoglobinopathies from IDA. Integration of these models into the laboratory information system facilitates automated hemoglobinopathy detection using routine CBC parameters.


Subject(s)
Hemoglobinopathies , Machine Learning , Humans , Hemoglobinopathies/diagnosis , Hemoglobinopathies/genetics , Hemoglobinopathies/blood , Retrospective Studies , Blood Cell Count , Adult , Female , Male , Logistic Models , ROC Curve
8.
Eur Radiol ; 2024 May 17.
Article in English | MEDLINE | ID: mdl-38758252

ABSTRACT

INTRODUCTION: This study investigates the performance of a commercially available artificial intelligence (AI) system to identify normal chest radiographs and its potential to reduce radiologist workload. METHODS: Retrospective analysis included consecutive chest radiographs from two medical centers between Oct 1, 2016 and Oct 14, 2016. Exclusions comprised follow-up exams within the inclusion period, bedside radiographs, incomplete images, imported radiographs, and pediatric radiographs. Three chest radiologists categorized findings into normal, clinically irrelevant, clinically relevant, urgent, and critical. A commercial AI system processed all radiographs, scoring 10 chest abnormalities on a 0-100 confidence scale. AI system performance was evaluated using the area under the ROC curve (AUC), assessing the detection of normal radiographs. Sensitivity was calculated for the default and a conservative operating point. the detection of negative predictive value (NPV) for urgent and critical findings, as well as the potential workload reduction, was calculated. RESULTS: A total of 2603 radiographs were acquired in 2141 unique patients. Post-exclusion, 1670 radiographs were analyzed. Categories included 479 normal, 332 clinically irrelevant, 339 clinically relevant, 501 urgent, and 19 critical findings. The AI system achieved an AUC of 0.92. Sensitivity for normal radiographs was 92% at default and 53% at the conservative operating point. At the conservative operating point, NPV was 98% for urgent and critical findings, and could result in a 15% workload reduction. CONCLUSION: A commercially available AI system effectively identifies normal chest radiographs and holds the potential to lessen radiologists' workload by omitting half of the normal exams from reporting. CLINICAL RELEVANCE STATEMENT: The AI system is able to detect half of all normal chest radiographs at a clinically acceptable operating point, thereby potentially reducing the workload for the radiologists by 15%. KEY POINTS: The AI system reached an AUC of 0.92 for the detection of normal chest radiographs. Fifty-three percent of normal chest radiographs were identified with a NPV of 98% for urgent findings. AI can reduce the workload of chest radiography reporting by 15%.

9.
Int J Infect Dis ; 145: 107081, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38701914

ABSTRACT

OBJECTIVES: To evaluate diagnostic yield and feasibility of integrating testing for TB and COVID-19 using molecular and radiological screening tools during community-based active case-finding (ACF). METHODS: Community-based participants with presumed TB and/or COVID-19 were recruited using a mobile clinic. Participants underwent simultaneous point-of-care (POC) testing for TB (sputum; Xpert Ultra) and COVID-19 (nasopharyngeal swabs; Xpert SARS-CoV-2). Sputum culture and SARS-CoV-2 RT-PCR served as reference standards. Participants underwent ultra-portable POC chest radiography with computer-aided detection (CAD). TB infectiousness was evaluated using smear microscopy, cough aerosol sampling studies (CASS), and chest radiographic cavity detection. Feasibility of POC testing was evaluated via user-appraisals. RESULTS: Six hundred and one participants were enrolled, with 144/601 (24.0%) reporting symptoms suggestive of TB and/or COVID-19. 16/144 (11.1%) participants tested positive for TB, while 10/144 (6.9%) tested positive for COVID-19 (2/144 [1.4%] had concurrent TB/COVID-19). Seven (7/16 [43.8%]) individuals with TB were probably infectious. Test-specific sensitivity and specificity (95% CI) were: Xpert Ultra 75.0% (42.8-94.5) and 96.9% (92.4-99.2); Xpert SARS-CoV-2 66.7% (22.3-95.7) and 97.1% (92.7-99.2). Area under the curve (AUC) for CAD4TB was 0.90 (0.82-0.97). User appraisals indicated POC Xpert to have 'good' user-friendliness. CONCLUSIONS: Integrating TB/COVID-19 screening during community-based ACF using POC molecular and radiological tools is feasible, has a high diagnostic yield, and can identity probably infectious persons.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , COVID-19/diagnosis , COVID-19/epidemiology , Male , Female , Adult , Middle Aged , Mass Screening/methods , Point-of-Care Testing , Sputum/microbiology , Sputum/virology , Tuberculosis/diagnosis , Tuberculosis/epidemiology , Tuberculosis/diagnostic imaging , Africa, Southern/epidemiology , Sensitivity and Specificity , Feasibility Studies , Tuberculosis, Pulmonary/diagnosis , Tuberculosis, Pulmonary/diagnostic imaging , Tuberculosis, Pulmonary/epidemiology
10.
Eur Radiol ; 34(10): 6600-6613, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38634877

ABSTRACT

OBJECTIVES: To develop and validate an artificial intelligence (AI) system for measuring and detecting signs of carpal instability on conventional radiographs. MATERIALS AND METHODS: Two case-control datasets of hand and wrist radiographs were retrospectively acquired at three hospitals (hospitals A, B, and C). Dataset 1 (2178 radiographs from 1993 patients, hospitals A and B, 2018-2019) was used for developing an AI system for measuring scapholunate (SL) joint distances, SL and capitolunate (CL) angles, and carpal arc interruptions. Dataset 2 (481 radiographs from 217 patients, hospital C, 2017-2021) was used for testing, and with a subsample (174 radiographs from 87 patients), an observer study was conducted to compare its performance to five clinicians. Evaluation metrics included mean absolute error (MAE), sensitivity, and specificity. RESULTS: Dataset 2 included 258 SL distances, 189 SL angles, 191 CL angles, and 217 carpal arc labels obtained from 217 patients (mean age, 51 years ± 23 [standard deviation]; 133 women). The MAE in measuring SL distances, SL angles, and CL angles was respectively 0.65 mm (95%CI: 0.59, 0.72), 7.9 degrees (95%CI: 7.0, 8.9), and 5.9 degrees (95%CI: 5.2, 6.6). The sensitivity and specificity for detecting arc interruptions were 83% (95%CI: 74, 91) and 64% (95%CI: 56, 71). The measurements were largely comparable to those of the clinicians, while arc interruption detections were more accurate than those of most clinicians. CONCLUSION: This study demonstrates that a newly developed automated AI system accurately measures and detects signs of carpal instability on conventional radiographs. CLINICAL RELEVANCE STATEMENT: This system has the potential to improve detections of carpal arc interruptions and could be a promising tool for supporting clinicians in detecting carpal instability.


Subject(s)
Artificial Intelligence , Joint Instability , Humans , Female , Middle Aged , Male , Joint Instability/diagnostic imaging , Retrospective Studies , Sensitivity and Specificity , Case-Control Studies , Adult , Carpal Bones/diagnostic imaging , Radiography/methods , Wrist Joint/diagnostic imaging , Carpal Joints/diagnostic imaging , Aged
11.
Sci Rep ; 14(1): 7136, 2024 03 26.
Article in English | MEDLINE | ID: mdl-38531958

ABSTRACT

Programmed death-ligand 1 (PD-L1) expression is currently used in the clinic to assess eligibility for immune-checkpoint inhibitors via the tumor proportion score (TPS), but its efficacy is limited by high interobserver variability. Multiple papers have presented systems for the automatic quantification of TPS, but none report on the task of determining cell-level PD-L1 expression and often reserve their evaluation to a single PD-L1 monoclonal antibody or clinical center. In this paper, we report on a deep learning algorithm for detecting PD-L1 negative and positive tumor cells at a cellular level and evaluate it on a cell-level reference standard established by six readers on a multi-centric, multi PD-L1 assay dataset. This reference standard also provides for the first time a benchmark for computer vision algorithms. In addition, in line with other papers, we also evaluate our algorithm at slide-level by measuring the agreement between the algorithm and six pathologists on TPS quantification. We find a moderately low interobserver agreement at cell-level level (mean reader-reader F1 score = 0.68) which our algorithm sits slightly under (mean reader-AI F1 score = 0.55), especially for cases from the clinical center not included in the training set. Despite this, we find good AI-pathologist agreement on quantifying TPS compared to the interobserver agreement (mean reader-reader Cohen's kappa = 0.54, 95% CI 0.26-0.81, mean reader-AI kappa = 0.49, 95% CI 0.27-0.72). In conclusion, our deep learning algorithm demonstrates promise in detecting PD-L1 expression at a cellular level and exhibits favorable agreement with pathologists in quantifying the tumor proportion score (TPS). We publicly release our models for use via the Grand-Challenge platform.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Deep Learning , Lung Neoplasms , Humans , Carcinoma, Non-Small-Cell Lung/pathology , Lung Neoplasms/pathology , Pathologists , B7-H1 Antigen/metabolism , Immunohistochemistry , Biomarkers, Tumor/metabolism
12.
Sci Data ; 11(1): 264, 2024 Mar 02.
Article in English | MEDLINE | ID: mdl-38431692

ABSTRACT

This paper presents a large publicly available multi-center lumbar spine magnetic resonance imaging (MRI) dataset with reference segmentations of vertebrae, intervertebral discs (IVDs), and spinal canal. The dataset includes 447 sagittal T1 and T2 MRI series from 218 patients with a history of low back pain and was collected from four different hospitals. An iterative data annotation approach was used by training a segmentation algorithm on a small part of the dataset, enabling semi-automatic segmentation of the remaining images. The algorithm provided an initial segmentation, which was subsequently reviewed, manually corrected, and added to the training data. We provide reference performance values for this baseline algorithm and nnU-Net, which performed comparably. Performance values were computed on a sequestered set of 39 studies with 97 series, which were additionally used to set up a continuous segmentation challenge that allows for a fair comparison of different segmentation algorithms. This study may encourage wider collaboration in the field of spine segmentation and improve the diagnostic value of lumbar spine MRI.


Subject(s)
Intervertebral Disc , Lumbar Vertebrae , Humans , Algorithms , Image Processing, Computer-Assisted/methods , Intervertebral Disc/pathology , Lumbar Vertebrae/diagnostic imaging , Magnetic Resonance Imaging/methods , Low Back Pain
14.
BMC Oral Health ; 24(1): 387, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38532414

ABSTRACT

OBJECTIVE: Panoramic radiographs (PRs) provide a comprehensive view of the oral and maxillofacial region and are used routinely to assess dental and osseous pathologies. Artificial intelligence (AI) can be used to improve the diagnostic accuracy of PRs compared to bitewings and periapical radiographs. This study aimed to evaluate the advantages and challenges of using publicly available datasets in dental AI research, focusing on solving the novel task of predicting tooth segmentations, FDI numbers, and tooth diagnoses, simultaneously. MATERIALS AND METHODS: Datasets from the OdontoAI platform (tooth instance segmentations) and the DENTEX challenge (tooth bounding boxes with associated diagnoses) were combined to develop a two-stage AI model. The first stage implemented tooth instance segmentation with FDI numbering and extracted regions of interest around each tooth segmentation, whereafter the second stage implemented multi-label classification to detect dental caries, impacted teeth, and periapical lesions in PRs. The performance of the automated tooth segmentation algorithm was evaluated using a free-response receiver-operating-characteristics (FROC) curve and mean average precision (mAP) metrics. The diagnostic accuracy of detection and classification of dental pathology was evaluated with ROC curves and F1 and AUC metrics. RESULTS: The two-stage AI model achieved high accuracy in tooth segmentations with a FROC score of 0.988 and a mAP of 0.848. High accuracy was also achieved in the diagnostic classification of impacted teeth (F1 = 0.901, AUC = 0.996), whereas moderate accuracy was achieved in the diagnostic classification of deep caries (F1 = 0.683, AUC = 0.960), early caries (F1 = 0.662, AUC = 0.881), and periapical lesions (F1 = 0.603, AUC = 0.974). The model's performance correlated positively with the quality of annotations in the used public datasets. Selected samples from the DENTEX dataset revealed cases of missing (false-negative) and incorrect (false-positive) diagnoses, which negatively influenced the performance of the AI model. CONCLUSIONS: The use and pooling of public datasets in dental AI research can significantly accelerate the development of new AI models and enable fast exploration of novel tasks. However, standardized quality assurance is essential before using the datasets to ensure reliable outcomes and limit potential biases.


Subject(s)
Dental Caries , Tooth, Impacted , Tooth , Humans , Artificial Intelligence , Radiography, Panoramic , Bone and Bones
15.
Med Phys ; 51(4): 2834-2845, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38329315

ABSTRACT

BACKGROUND: Automated estimation of Pulmonary function test (PFT) results from Computed Tomography (CT) could advance the use of CT in screening, diagnosis, and staging of restrictive pulmonary diseases. Estimating lung function per lobe, which cannot be done with PFTs, would be helpful for risk assessment for pulmonary resection surgery and bronchoscopic lung volume reduction. PURPOSE: To automatically estimate PFT results from CT and furthermore disentangle the individual contribution of pulmonary lobes to a patient's lung function. METHODS: We propose I3Dr, a deep learning architecture for estimating global measures from an image that can also estimate the contributions of individual parts of the image to this global measure. We apply it to estimate the separate contributions of each pulmonary lobe to a patient's total lung function from CT, while requiring only CT scans and patient level lung function measurements for training. I3Dr consists of a lobe-level and a patient-level model. The lobe-level model extracts all anatomical pulmonary lobes from a CT scan and processes them in parallel to produce lobe level lung function estimates that sum up to a patient level estimate. The patient-level model directly estimates patient level lung function from a CT scan and is used to re-scale the output of the lobe-level model to increase performance. After demonstrating the viability of the proposed approach, the I3Dr model is trained and evaluated for PFT result estimation using a large data set of 8 433 CT volumes for training, 1 775 CT volumes for validation, and 1 873 CT volumes for testing. RESULTS: First, we demonstrate the viability of our approach by showing that a model trained with a collection of digit images to estimate their sum implicitly learns to assign correct values to individual digits. Next, we show that our models can estimate lobe-level quantities, such as COVID-19 severity scores, pulmonary volume (PV), and functional pulmonary volume (FPV) from CT while only provided with patient-level quantities during training. Lastly, we train and evaluate models for producing spirometry and diffusion capacity of carbon mono-oxide (DLCO) estimates at the patient and lobe level. For producing Forced Expiratory Volume in one second (FEV1), Forced Vital Capacity (FVC), and DLCO estimates, I3Dr obtains mean absolute errors (MAE) of 0.377 L, 0.297 L, and 2.800 mL/min/mm Hg respectively. We release the resulting algorithms for lung function estimation to the research community at https://grand-challenge.org/algorithms/lobe-wise-lung-function-estimation/ CONCLUSIONS: I3Dr can estimate global measures from an image, as well as the contributions of individual parts of the image to this global measure. It offers a promising approach for estimating PFT results from CT scans and disentangling the individual contribution of pulmonary lobes to a patient's lung function. The findings presented in this work may advance the use of CT in screening, diagnosis, and staging of restrictive pulmonary diseases as well as in risk assessment for pulmonary resection surgery and bronchoscopic lung volume reduction.


Subject(s)
Lung Diseases , Lung , Humans , Lung/diagnostic imaging , Lung/surgery , Tomography, X-Ray Computed/methods , Vital Capacity , Machine Learning
17.
Nat Methods ; 21(2): 195-212, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347141

ABSTRACT

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Machine Learning , Semantics
18.
Eur Radiol ; 34(9): 5748-5757, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38383922

ABSTRACT

OBJECTIVES: Severity of degenerative scoliosis (DS) is assessed by measuring the Cobb angle on anteroposterior radiographs. However, MRI images are often available to study the degenerative spine. This retrospective study aims to develop and evaluate the reliability of a novel automatic method that measures coronal Cobb angles on lumbar MRI in DS patients. MATERIALS AND METHODS: Vertebrae and intervertebral discs were automatically segmented using a 3D AI algorithm, trained on 447 lumbar MRI series. The segmentations were used to calculate all possible angles between the vertebral endplates, with the largest being the Cobb angle. The results were validated with 50 high-resolution sagittal lumbar MRI scans of DS patients, in which three experienced readers measured the Cobb angle. Reliability was determined using the intraclass correlation coefficient (ICC). RESULTS: The ICCs between the readers ranged from 0.90 (95% CI 0.83-0.94) to 0.93 (95% CI 0.88-0.96). The ICC between the maximum angle found by the algorithm and the average manually measured Cobb angles was 0.83 (95% CI 0.71-0.90). In 9 out of the 50 cases (18%), all readers agreed on both vertebral levels for Cobb angle measurement. When using the algorithm to extract the angles at the vertebral levels chosen by the readers, the ICCs ranged from 0.92 (95% CI 0.87-0.96) to 0.97 (95% CI 0.94-0.98). CONCLUSION: The Cobb angle can be accurately measured on MRI using the newly developed algorithm in patients with DS. The readers failed to consistently choose the same vertebral level for Cobb angle measurement, whereas the automatic approach ensures the maximum angle is consistently measured. CLINICAL RELEVANCE STATEMENT: Our AI-based algorithm offers reliable Cobb angle measurement on routine MRI for degenerative scoliosis patients, potentially reducing the reliance on conventional radiographs, ensuring consistent assessments, and therefore improving patient care. KEY POINTS: • While often available, MRI images are rarely utilized to determine the severity of degenerative scoliosis. • The presented MRI Cobb angle algorithm is more reliable than humans in patients with degenerative scoliosis. • Radiographic imaging for Cobb angle measurements is mitigated when lumbar MRI images are available.


Subject(s)
Algorithms , Lumbar Vertebrae , Magnetic Resonance Imaging , Scoliosis , Humans , Scoliosis/diagnostic imaging , Magnetic Resonance Imaging/methods , Female , Male , Lumbar Vertebrae/diagnostic imaging , Reproducibility of Results , Retrospective Studies , Aged , Middle Aged , Aged, 80 and over , Artificial Intelligence , Adult , Imaging, Three-Dimensional/methods
19.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347140

ABSTRACT

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Subject(s)
Artificial Intelligence
20.
ERJ Open Res ; 10(1)2024 Jan.
Article in English | MEDLINE | ID: mdl-38196890

ABSTRACT

Objectives: Use of computer-aided detection (CAD) software is recommended to improve tuberculosis screening and triage, but threshold determination is challenging if reference testing has not been performed in all individuals. We aimed to determine such thresholds through secondary analysis of the 2019 Lesotho national tuberculosis prevalence survey. Methods: Symptom screening and chest radiographs were performed in participants aged ≥15 years; those symptomatic or with abnormal chest radiographs provided samples for Xpert MTB/RIF and culture testing. Chest radiographs were processed using CAD4TB version 7. We used six methodological approaches to deal with participants who did not have bacteriological test results to estimate pulmonary tuberculosis prevalence and assess diagnostic accuracy. Results: Among 17 070 participants, 5214 (31%) had their tuberculosis status determined; 142 had tuberculosis. Prevalence estimates varied between methodological approaches (0.83-2.72%). Using multiple imputation to estimate tuberculosis status for those eligible but not tested, and assuming those not eligible for testing were negative, a CAD4TBv7 threshold of 13 had a sensitivity of 89.7% (95% CI 84.6-94.8) and a specificity of 74.2% (73.6-74.9), close to World Health Organization (WHO) target product profile criteria. Assuming all those not tested were negative produced similar results. Conclusions: This is the first study to evaluate CAD4TB in a community screening context employing a range of approaches to account for unknown tuberculosis status. The assumption that those not tested are negative - regardless of testing eligibility status - was robust. As threshold determination must be context specific, our analytically straightforward approach should be adopted to leverage prevalence surveys for CAD threshold determination in other settings with a comparable proportion of eligible but not tested participants.

SELECTION OF CITATIONS
SEARCH DETAIL