Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 75
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Mod Pathol ; 37(8): 100531, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38830407

ABSTRACT

Histopathological assessment of esophageal biopsies is a key part in the management of patients with Barrett esophagus (BE) but prone to observer variability and reliable diagnostic methods are needed. Artificial intelligence (AI) is emerging as a powerful tool for aided diagnosis but often relies on abstract test and validation sets while real-world behavior is unknown. In this study, we developed a 2-stage AI system for histopathological assessment of BE-related dysplasia using deep learning to enhance the efficiency and accuracy of the pathology workflow. The AI system was developed and trained on 290 whole-slide images (WSIs) that were annotated at glandular and tissue levels. The system was designed to identify individual glands, grade dysplasia, and assign a WSI-level diagnosis. The proposed method was evaluated by comparing the performance of our AI system with that of a large international and heterogeneous group of 55 gastrointestinal pathologists assessing 55 digitized biopsies spanning the complete spectrum of BE-related dysplasia. The AI system correctly graded 76.4% of the WSIs, surpassing the performance of 53 out of the 55 participating pathologists. Furthermore, the receiver-operating characteristic analysis showed that the system's ability to predict the absence (nondysplastic BE) versus the presence of any dysplasia was with an area under the curve of 0.94 and a sensitivity of 0.92 at a specificity of 0.94. These findings demonstrate that this AI system has the potential to assist pathologists in assessment of BE-related dysplasia. The system's outputs could provide a reliable and consistent secondary diagnosis in challenging cases or be used for triaging low-risk nondysplastic biopsies, thereby reducing the workload of pathologists and increasing throughput.

2.
Radiology ; 298(1): E18-E28, 2021 01.
Article in English | MEDLINE | ID: mdl-32729810

ABSTRACT

Background The coronavirus disease 2019 (COVID-19) pandemic has spread across the globe with alarming speed, morbidity, and mortality. Immediate triage of patients with chest infections suspected to be caused by COVID-19 using chest CT may be of assistance when results from definitive viral testing are delayed. Purpose To develop and validate an artificial intelligence (AI) system to score the likelihood and extent of pulmonary COVID-19 on chest CT scans using the COVID-19 Reporting and Data System (CO-RADS) and CT severity scoring systems. Materials and Methods The CO-RADS AI system consists of three deep-learning algorithms that automatically segment the five pulmonary lobes, assign a CO-RADS score for the suspicion of COVID-19, and assign a CT severity score for the degree of parenchymal involvement per lobe. This study retrospectively included patients who underwent a nonenhanced chest CT examination because of clinical suspicion of COVID-19 at two medical centers. The system was trained, validated, and tested with data from one of the centers. Data from the second center served as an external test set. Diagnostic performance and agreement with scores assigned by eight independent observers were measured using receiver operating characteristic analysis, linearly weighted κ values, and classification accuracy. Results A total of 105 patients (mean age, 62 years ± 16 [standard deviation]; 61 men) and 262 patients (mean age, 64 years ± 16; 154 men) were evaluated in the internal and external test sets, respectively. The system discriminated between patients with COVID-19 and those without COVID-19, with areas under the receiver operating characteristic curve of 0.95 (95% CI: 0.91, 0.98) and 0.88 (95% CI: 0.84, 0.93), for the internal and external test sets, respectively. Agreement with the eight human observers was moderate to substantial, with mean linearly weighted κ values of 0.60 ± 0.01 for CO-RADS scores and 0.54 ± 0.01 for CT severity scores. Conclusion With high diagnostic performance, the CO-RADS AI system correctly identified patients with COVID-19 using chest CT scans and assigned standardized CO-RADS and CT severity scores that demonstrated good agreement with findings from eight independent observers and generalized well to external data. © RSNA, 2020 Supplemental material is available for this article.


Subject(s)
Artificial Intelligence , COVID-19/diagnostic imaging , Severity of Illness Index , Thorax/diagnostic imaging , Tomography, X-Ray Computed , Aged , Data Systems , Female , Humans , Male , Middle Aged , Research Design , Retrospective Studies
3.
Ophthalmology ; 127(8): 1086-1096, 2020 08.
Article in English | MEDLINE | ID: mdl-32197912

ABSTRACT

PURPOSE: To develop and validate a deep learning model for the automatic segmentation of geographic atrophy (GA) using color fundus images (CFIs) and its application to study the growth rate of GA. DESIGN: Prospective, multicenter, natural history study with up to 15 years of follow-up. PARTICIPANTS: Four hundred nine CFIs of 238 eyes with GA from the Rotterdam Study (RS) and Blue Mountain Eye Study (BMES) for model development, and 3589 CFIs of 376 eyes from the Age-Related Eye Disease Study (AREDS) for analysis of GA growth rate. METHODS: A deep learning model based on an ensemble of encoder-decoder architectures was implemented and optimized for the segmentation of GA in CFIs. Four experienced graders delineated, in consensus, GA in CFIs from the RS and BMES. These manual delineations were used to evaluate the segmentation model using 5-fold cross-validation. The model was applied further to CFIs from the AREDS to study the growth rate of GA. Linear regression analysis was used to study associations between structural biomarkers at baseline and the GA growth rate. A general estimate of the progression of GA area over time was made by combining growth rates of all eyes with GA from the AREDS set. MAIN OUTCOME MEASURES: Automatically segmented GA and GA growth rate. RESULTS: The model obtained an average Dice coefficient of 0.72±0.26 on the BMES and RS set while comparing the automatically segmented GA area with the graders' manual delineations. An intraclass correlation coefficient of 0.83 was reached between the automatically estimated GA area and the graders' consensus measures. Nine automatically calculated structural biomarkers (area, filled area, convex area, convex solidity, eccentricity, roundness, foveal involvement, perimeter, and circularity) were significantly associated with growth rate. Combining all growth rates indicated that GA area grows quadratically up to an area of approximately 12 mm2, after which growth rate stabilizes or decreases. CONCLUSIONS: The deep learning model allowed for fully automatic and robust segmentation of GA on CFIs. These segmentations can be used to extract structural characteristics of GA that predict its growth rate.


Subject(s)
Deep Learning , Fluorescein Angiography/methods , Forecasting , Geographic Atrophy/diagnosis , Retina/pathology , Aged , Disease Progression , Female , Follow-Up Studies , Fundus Oculi , Humans , Male , Prospective Studies , Severity of Illness Index
4.
Retina ; 40(8): 1565-1573, 2020 Aug.
Article in English | MEDLINE | ID: mdl-31356496

ABSTRACT

PURPOSE: To investigate hyperreflective foci (HF) on spectral-domain optical coherence tomography in patients with Type 1 diabetes mellitus across different stages of diabetic retinopathy (DR) and diabetic macular edema (DME) and to study clinical and morphological characteristics associated with HF. METHODS: Spectral-domain optical coherence tomography scans and color fundus photographs were obtained of 260 patients. Spectral-domain optical coherence tomography scans were graded for the number of HF and other morphological characteristics. The distribution of HF across different stages of DR and DME severity were studied. Linear mixed-model analysis was used to study associations between the number of HF and clinical and morphological parameters. RESULTS: Higher numbers of HF were found in patients with either stage of DME versus patients without DME (P < 0.001). A trend was observed between increasing numbers of HF and DR severity, although significance was only reached for moderate nonproliferative DR (P = 0.001) and proliferative DR (P = 0.019). Higher numbers of HF were associated with longer diabetes duration (P = 0.029), lower high-density lipoprotein cholesterol (P = 0.005), and the presence of microalbuminuria (P = 0.005). In addition, HF were associated with morphological characteristics on spectral-domain optical coherence tomography, including central retinal thickness (P = 0.004), cysts (P < 0.001), subretinal fluid (P = 0.001), and disruption of the external limiting membrane (P = 0.018). CONCLUSION: The number of HF was associated with different stages of DR and DME severity. The associations between HF and clinical and morphological characteristics can be of use in further studies evaluating the role of HF as a biomarker for disease progression and treatment response.


Subject(s)
Diabetes Mellitus, Type 1/complications , Diabetic Retinopathy/etiology , Macular Edema/etiology , Photography , Retina/pathology , Tomography, Optical Coherence , Adult , Aged , Diabetic Retinopathy/classification , Diabetic Retinopathy/diagnostic imaging , Female , Humans , Macular Edema/classification , Macular Edema/diagnostic imaging , Male , Middle Aged , Retina/diagnostic imaging , Slit Lamp Microscopy , Visual Acuity/physiology
5.
Ophthalmology ; 126(12): 1712-1721, 2019 12.
Article in English | MEDLINE | ID: mdl-31522899

ABSTRACT

PURPOSE: To investigate intersibling phenotypic concordance in Stargardt disease (STGD1). DESIGN: Retrospective cohort study. PARTICIPANTS: Siblings with genetically confirmed STGD1 and at least 1 available fundus autofluorescence (FAF) image of both eyes. METHODS: We compared age at onset within families. Disease duration was matched to investigate differences in best-corrected visual acuity (BCVA) and compared the survival time for reaching severe visual impairment (<20/200 Snellen or >1.0 logarithm of the minimum angle of resolution [logMAR]). Central retinal atrophy area was quantified independently by 2 experienced graders using semiautomated software and compared between siblings. Both graders performed qualitative assessment of FAF and spectral-domain (SD) OCT images to identify phenotypic differences. MAIN OUTCOME MEASURES: Differences in age at onset, disease duration-matched BCVA, time to severe visual impairment development, FAF atrophy area, FAF patterns, and genotypes. RESULTS: Substantial differences in age at onset were present in 5 of 17 families, ranging from 13 to 39 years. Median BCVA at baseline was 0.60 logMAR (range, -0.20 to 2.30 logMAR; Snellen equivalent, 20/80 [range, 20/12-hand movements]) in the right eye and 0.50 logMAR (range, -0.20 to 2.30 logMAR; Snellen equivalent, 20/63 [range, 20/12-hand movements]) in the left eye. Disease duration-matched BCVA was investigated in 12 of 17 families, and the median difference was 0.41 logMAR (range, 0.00-1.10 logMAR) for the right eye and 0.41 logMAR (range, 0.00-1.08 logMAR) for the left eye. We observed notable differences in time to severe visual impairment development in 7 families, ranging from 1 to 29 years. Median central retinal atrophy area was 11.38 mm2 in the right eye (range, 1.98-44.78 mm2) and 10.59 mm2 in the left eye (range, 1.61-40.59 mm2) and highly comparable between siblings. Similarly, qualitative FAF and SD OCT phenotypes were highly comparable between siblings. CONCLUSIONS: Phenotypic discordance between siblings with STGD1 carrying the same ABCA4 variants is a prevalent phenomenon. Although the FAF phenotypes are highly comparable between siblings, functional outcomes differ substantially. This complicates both sibling-based prognosis and genotype-phenotype correlations and has important implications for patient care and management.


Subject(s)
Siblings , Stargardt Disease/genetics , Stargardt Disease/pathology , ATP-Binding Cassette Transporters/genetics , Adolescent , Adult , Age of Onset , Child , Child, Preschool , Electroretinography , Female , Fluorescein Angiography , Follow-Up Studies , Genetic Association Studies , Humans , Male , Middle Aged , Phenotype , Retrospective Studies , Tomography, Optical Coherence , Vision Disorders/pathology , Visual Acuity/physiology , Young Adult
6.
Ophthalmologica ; 241(2): 61-72, 2019.
Article in English | MEDLINE | ID: mdl-30153664

ABSTRACT

PURPOSE: Currently, no outcome measures are clinically validated and accepted as clinical endpoints by regulatory agencies for drug development in intermediate age-related macular degeneration (iAMD). The MACUSTAR Consortium, a public-private research group funded by the European Innovative Medicines Initiative intends to close this gap. PROCEDURES: Development of study protocol and statistical analysis plan including predictive modelling of multimodal endpoints based on a review of the literature and expert consensus. RESULTS: This observational study consists of a cross-sectional and a longitudinal part. Functional outcome measures assessed under low contrast and low luminance have the potential to detect progression of visual deficit within iAMD and to late AMD. Structural outcome measures will be multimodal and investigate topographical relationships with function. Current patient-reported outcome measures (PROMs) are not acceptable to regulators and may not capture the functional deficit specific to iAMD with needed precision, justifying development of novel PROMs for iAMD. The total sample size will be n = 750, consisting mainly of subjects with iAMD (n = 600). CONCLUSIONS: As clinical endpoints currently accepted by regulators cannot detect functional loss or patient-relevant impact in iAMD, we will clinically validate novel candidate endpoints for iAMD.


Subject(s)
Disease Management , Fluorescein Angiography/methods , Macular Degeneration/diagnosis , Patient Reported Outcome Measures , Retina/diagnostic imaging , Tomography, Optical Coherence/methods , Visual Acuity , Fundus Oculi , Humans , Macular Degeneration/physiopathology , Retina/physiopathology
7.
Behav Brain Funct ; 12(1): 2, 2016 Jan 08.
Article in English | MEDLINE | ID: mdl-26746237

ABSTRACT

BACKGROUND: Attention deficit hyperactivity disorder (ADHD) has a strong genetic component. The study is aimed to test the association of 34 polymorphisms with ADHD symptomatology considering the role of clinical subtypes and sex in a Spanish population. METHODS: A cohort of ADHD 290 patients and 340 controls aged 6-18 years were included in a case-control study, stratified by sex and ADHD subtype. Multivariate logistic regression was used to detect the combined effects of multiple variants. RESULTS: After correcting for multiple testing, we found several significant associations between the polymorphisms and ADHD (p value corrected ≤0.05): (1) SLC6A4 and LPHN3 were associated in the total population; (2) SLC6A2, SLC6A3, SLC6A4 and LPHN3 were associated in the combined subtype; and (3) LPHN3 was associated in the male sample. Multivariable logistic regression was used to estimate the influence of these variables for the total sample, combined and inattentive subtype, female and male sample, revealing that these factors contributed to 8.5, 14.6, 2.6, 16.5 and 8.5 % of the variance respectively. CONCLUSIONS: We report evidence of the genetic contribution of common variants to the ADHD phenotype in four genes, with the LPHN3 gene playing a particularly important role. Future studies should investigate the contribution of genetic variants to the risk of ADHD considering their role in specific sex or subtype, as doing so may produce more predictable and robust models.


Subject(s)
Attention Deficit Disorder with Hyperactivity/genetics , Adolescent , Case-Control Studies , Child , Cohort Studies , Dopamine Plasma Membrane Transport Proteins/genetics , Female , Genetic Association Studies/methods , Genetic Predisposition to Disease , Humans , Male , Multivariate Analysis , Norepinephrine Plasma Membrane Transport Proteins/genetics , Polymorphism, Single Nucleotide , Receptors, G-Protein-Coupled/genetics , Receptors, Peptide/genetics , Serotonin Plasma Membrane Transport Proteins/genetics
8.
Radiology ; 277(3): 863-71, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26020438

ABSTRACT

PURPOSE: To examine the factors that affect inter- and intraobserver agreement for pulmonary nodule type classification on low-radiation-dose computed tomographic (CT) images, and their potential effect on patient management. MATERIALS AND METHODS: Nodules (n = 160) were randomly selected from the Dutch-Belgian Lung Cancer Screening Trial cohort, with equal numbers of nodule types and similar sizes. Nodules were scored by eight radiologists by using morphologic categories proposed by the Fleischner Society guidelines for management of pulmonary nodules as solid, part solid with a solid component smaller than 5 mm, part solid with a solid component 5 mm or larger, or pure ground glass. Inter- and intraobserver agreement was analyzed by using Cohen κ statistics. Multivariate analysis of variance was performed to assess the effect of nodule characteristics and image quality on observer disagreement. Effect on nodule management was estimated by differentiating CT follow-up for ground-glass nodules, solid nodules 8 mm or smaller, and part-solid nodules smaller than 5 mm from immediate diagnostic work-up for solid nodules larger than 8 mm and part-solid nodules 5 mm or greater. RESULTS: Pair-wise inter- and intraobserver agreement was moderate (mean κ, 0.51 [95% confidence interval, 0.30, 0.68] and 0.57 [95% confidence interval, 0.47, 0.71]). Categorization as part-solid nodules and location in the upper lobe significantly reduced observer agreement (P = .012 and P < .001, respectively). By considering all possible reading pairs (28 possible combinations of observer pairs × 160 nodules = 4480 possible agreements or disagreements), a discordant nodule classification was found in 36.4% (1630 of 4480), related to presence or size of a solid component in 88.7% (1446 of 1630). Two-thirds of these discrepant readings (1061 of 1630) would have potentially resulted in different nodule management. CONCLUSION: There is moderate inter- and intraobserver agreement for nodule classification by using current recommendations for low-radiation-dose CT examinations of the chest. Discrepancies in nodule categorization were mainly caused by disagreement on the size and presence of a solid component, which may lead to different management in the majority of cases with such discrepancies. (©) RSNA, 2015.


Subject(s)
Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/therapy , Tomography, X-Ray Computed , Humans , Observer Variation
9.
Magn Reson Imaging ; 107: 33-46, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38184093

ABSTRACT

Acquiring fully-sampled MRI k-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or non-Cartesian trajectories can be implemented in MRI scanners as alternative subsampling options. This work investigates the impact of the k-space subsampling scheme on the quality of reconstructed accelerated MRI measurements produced by trained DL models. The Recurrent Variational Network (RecurrentVarNet) was used as the DL-based MRI-reconstruction architecture. Cartesian, fully-sampled multi-coil k-space measurements from three datasets were retrospectively subsampled with different accelerations using eight distinct subsampling schemes: four Cartesian-rectilinear, two Cartesian non-rectilinear, and two non-Cartesian. Experiments were conducted in two frameworks: scheme-specific, where a distinct model was trained and evaluated for each dataset-subsampling scheme pair, and multi-scheme, where for each dataset a single model was trained on data randomly subsampled by any of the eight schemes and evaluated on data subsampled by all schemes. In both frameworks, RecurrentVarNets trained and evaluated on non-rectilinearly subsampled data demonstrated superior performance, particularly for high accelerations. In the multi-scheme setting, reconstruction performance on rectilinearly subsampled data improved when compared to the scheme-specific experiments. Our findings demonstrate the potential for using DL-based methods, trained on non-rectilinearly subsampled measurements, to optimize scan time and image quality.


Subject(s)
Algorithms , Magnetic Resonance Imaging , Retrospective Studies , Magnetic Resonance Imaging/methods , Radionuclide Imaging , Phantoms, Imaging , Image Processing, Computer-Assisted/methods
10.
Med Image Anal ; 97: 103259, 2024 Jun 27.
Article in English | MEDLINE | ID: mdl-38959721

ABSTRACT

Deep learning classification models for medical image analysis often perform well on data from scanners that were used to acquire the training data. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to enhance the reliability of deep learning classification models using a novel method called Uncertainty-Based Instance eXclusion (UBIX). UBIX is an inference-time module that can be employed in multiple-instance learning (MIL) settings. MIL is a paradigm in which instances (generally crops or slices) of a bag (generally an image) contribute towards a bag-level output. Instead of assuming equal contribution of all instances to the bag-level output, UBIX detects instances corrupted due to local artifacts on-the-fly using uncertainty estimation, reducing or fully ignoring their contributions before MIL pooling. In our experiments, instances are 2D slices and bags are volumetric images, but alternative definitions are also possible. Although UBIX is generally applicable to diverse classification tasks, we focused on the staging of age-related macular degeneration in optical coherence tomography. Our models were trained on data from a single scanner and tested on external datasets from different vendors, which included vendor-specific artifacts. UBIX showed reliable behavior, with a slight decrease in performance (a decrease of the quadratic weighted kappa (κw) from 0.861 to 0.708), when applied to images from different vendors containing artifacts; while a state-of-the-art 3D neural network without UBIX suffered from a significant detriment of performance (κw from 0.852 to 0.084) on the same test set. We showed that instances with unseen artifacts can be identified with OOD detection. UBIX can reduce their contribution to the bag-level predictions, improving reliability without retraining on new data. This potentially increases the applicability of artificial intelligence models to data from other scanners than the ones for which they were developed. The source code for UBIX, including trained model weights, is publicly available through https://github.com/qurAI-amsterdam/ubix-for-reliable-classification.

11.
IEEE Trans Med Imaging ; PP2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38530714

ABSTRACT

Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.

12.
IEEE Trans Med Imaging ; 43(1): 542-557, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37713220

ABSTRACT

The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening.


Subject(s)
Artificial Intelligence , Glaucoma , Humans , Glaucoma/diagnostic imaging , Fundus Oculi , Diagnostic Techniques, Ophthalmological , Algorithms
13.
Ophthalmol Sci ; 3(3): 100300, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37113471

ABSTRACT

Purpose: Significant visual impairment due to glaucoma is largely caused by the disease being detected too late. Objective: To build a labeled data set for training artificial intelligence (AI) algorithms for glaucoma screening by fundus photography, to assess the accuracy of the graders, and to characterize the features of all eyes with referable glaucoma (RG). Design: Cross-sectional study. Subjects: Color fundus photographs (CFPs) of 113 893 eyes of 60 357 individuals were obtained from EyePACS, California, United States, from a population screening program for diabetic retinopathy. Methods: Carefully selected graders (ophthalmologists and optometrists) graded the images. To qualify, they had to pass the European Optic Disc Assessment Trial optic disc assessment with ≥ 85% accuracy and 92% specificity. Of 90 candidates, 30 passed. Each image of the EyePACS set was then scored by varying random pairs of graders as "RG," "no referable glaucoma (NRG)," or "ungradable (UG)." In case of disagreement, a glaucoma specialist made the final grading. Referable glaucoma was scored if visual field damage was expected. In case of RG, graders were instructed to mark up to 10 relevant glaucomatous features. Main Outcome Measures: Qualitative features in eyes with RG. Results: The performance of each grader was monitored; if the sensitivity and specificity dropped below 80% and 95%, respectively (the final grade served as reference), they exited the study and their gradings were redone by other graders. In all, 20 graders qualified; their mean sensitivity and specificity (standard deviation [SD]) were 85.6% (5.7) and 96.1% (2.8), respectively. The 2 graders agreed in 92.45% of the images (Gwet's AC2, expressing the inter-rater reliability, was 0.917). Of all gradings, the sensitivity and specificity (95% confidence interval) were 86.0 (85.2-86.7)% and 96.4 (96.3-96.5)%, respectively. Of all gradable eyes (n = 111 183; 97.62%) the prevalence of RG was 4.38%. The most common features of RG were the appearance of the neuroretinal rim (NRR) inferiorly and superiorly. Conclusions: A large data set of CFPs was put together of sufficient quality to develop AI screening solutions for glaucoma. The most common features of RG were the appearance of the NRR inferiorly and superiorly. Disc hemorrhages were a rare feature of RG. Financial Disclosures: Proprietary or commercial disclosure may be found after the references.

14.
IEEE J Biomed Health Inform ; 27(11): 5483-5494, 2023 11.
Article in English | MEDLINE | ID: mdl-37682646

ABSTRACT

Retinal Optical Coherence Tomography (OCT) allows the non-invasive direct observation of the central nervous system, enabling the measurement and extraction of biomarkers from neural tissue that can be helpful in the assessment of ocular, systemic and Neurological Disorders (ND). Deep learning models can be trained to segment the retinal layers for biomarker extraction. However, the onset of ND can have an impact on the neural tissue, which can lead to the degraded performance of models not exposed to images displaying signs of disease during training. We present a fully automatic approach for the retinal layer segmentation in multiple neurodegenerative disorder scenarios, using an annotated dataset of patients of the most prevalent NDs: Alzheimer's disease, Parkinson's disease, multiple sclerosis and essential tremor, along with healthy control patients. Furthermore, we present a two-part, comprehensive study on the effects of ND on the performance of these models. The results show that images of healthy patients may not be sufficient for the robust training of automated segmentation models intended for the analysis of ND patients, and that using images representative of different NDs can increase the model performance. These results indicate that the presence or absence of patients of ND in datasets should be taken into account when training deep learning models for retinal layer segmentation, and that the proposed approach can provide a valuable tool for the robust and reliable diagnosis in multiple scenarios of ND.


Subject(s)
Multiple Sclerosis , Parkinson Disease , Humans , Retina , Tomography, Optical Coherence/methods
15.
Comput Biol Med ; 167: 107602, 2023 12.
Article in English | MEDLINE | ID: mdl-37925906

ABSTRACT

Accurate prediction of fetal weight at birth is essential for effective perinatal care, particularly in the context of antenatal management, which involves determining the timing and mode of delivery. The current standard of care involves performing a prenatal ultrasound 24 hours prior to delivery. However, this task presents challenges as it requires acquiring high-quality images, which becomes difficult during advanced pregnancy due to the lack of amniotic fluid. In this paper, we present a novel method that automatically predicts fetal birth weight by using fetal ultrasound video scans and clinical data. Our proposed method is based on a Transformer-based approach that combines a Residual Transformer Module with a Dynamic Affine Feature Map Transform. This method leverages tabular clinical data to evaluate 2D+t spatio-temporal features in fetal ultrasound video scans. Development and evaluation were carried out on a clinical set comprising 582 2D fetal ultrasound videos and clinical records of pregnancies from 194 patients performed less than 24 hours before delivery. Our results show that our method outperforms several state-of-the-art automatic methods and estimates fetal birth weight with an accuracy comparable to human experts. Hence, automatic measurements obtained by our method can reduce the risk of errors inherent in manual measurements. Observer studies suggest that our approach may be used as an aid for less experienced clinicians to predict fetal birth weight before delivery, optimizing perinatal care regardless of the available expertise.


Subject(s)
Fetal Weight , Ultrasonography, Prenatal , Infant, Newborn , Pregnancy , Humans , Female , Birth Weight , Ultrasonography, Prenatal/methods , Biometry
16.
Am J Obstet Gynecol MFM ; 5(12): 101182, 2023 12.
Article in English | MEDLINE | ID: mdl-37821009

ABSTRACT

BACKGROUND: Fetal weight is currently estimated from fetal biometry parameters using heuristic mathematical formulas. Fetal biometry requires measurements of the fetal head, abdomen, and femur. However, this examination is prone to inter- and intraobserver variability because of factors, such as the experience of the operator, image quality, maternal characteristics, or fetal movements. Our study tested the hypothesis that a deep learning method can estimate fetal weight based on a video scan of the fetal abdomen and gestational age with similar performance to the full biometry-based estimations provided by clinical experts. OBJECTIVE: This study aimed to develop and test a deep learning method to automatically estimate fetal weight from fetal abdominal ultrasound video scans. STUDY DESIGN: A dataset of 900 routine fetal ultrasound examinations was used. Among those examinations, 800 retrospective ultrasound video scans of the fetal abdomen from 700 pregnant women between 15 6/7 and 41 0/7 weeks of gestation were used to train the deep learning model. After the training phase, the model was evaluated on an external prospectively acquired test set of 100 scans from 100 pregnant women between 16 2/7 and 38 0/7 weeks of gestation. The deep learning model was trained to directly estimate fetal weight from ultrasound video scans of the fetal abdomen. The deep learning estimations were compared with manual measurements on the test set made by 6 human readers with varying levels of expertise. Human readers used standard 3 measurements made on the standard planes of the head, abdomen, and femur and heuristic formula to estimate fetal weight. The Bland-Altman analysis, mean absolute percentage error, and intraclass correlation coefficient were used to evaluate the performance and robustness of the deep learning method and were compared with human readers. RESULTS: Bland-Altman analysis did not show systematic deviations between readers and deep learning. The mean and standard deviation of the mean absolute percentage error between 6 human readers and the deep learning approach was 3.75%±2.00%. Excluding junior readers (residents), the mean absolute percentage error between 4 experts and the deep learning approach was 2.59%±1.11%. The intraclass correlation coefficients reflected excellent reliability and varied between 0.9761 and 0.9865. CONCLUSION: This study reports the use of deep learning to estimate fetal weight using only ultrasound video of the fetal abdomen from fetal biometry scans. Our experiments demonstrated similar performance of human measurements and deep learning on prospectively acquired test data. Deep learning is a promising approach to directly estimate fetal weight using ultrasound video scans of the fetal abdomen.


Subject(s)
Deep Learning , Fetal Weight , Pregnancy , Female , Humans , Retrospective Studies , Reproducibility of Results , Abdomen/diagnostic imaging
17.
IEEE Trans Artif Intell ; 3(2): 129-138, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35582210

ABSTRACT

Amidst the ongoing pandemic, the assessment of computed tomography (CT) images for COVID-19 presence can exceed the workload capacity of radiologists. Several studies addressed this issue by automating COVID-19 classification and grading from CT images with convolutional neural networks (CNNs). Many of these studies reported initial results of algorithms that were assembled from commonly used components. However, the choice of the components of these algorithms was often pragmatic rather than systematic and systems were not compared to each other across papers in a fair manner. We systematically investigated the effectiveness of using 3-D CNNs instead of 2-D CNNs for seven commonly used architectures, including DenseNet, Inception, and ResNet variants. For the architecture that performed best, we furthermore investigated the effect of initializing the network with pretrained weights, providing automatically computed lesion maps as additional network input, and predicting a continuous instead of a categorical output. A 3-D DenseNet-201 with these components achieved an area under the receiver operating characteristic curve of 0.930 on our test set of 105 CT scans and an AUC of 0.919 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2-D CNN. This article provides insights into the performance benefits of various components for COVID-19 classification and grading systems. We have created a challenge on grand-challenge.org to allow for a fair comparison between the results of this and future research.

18.
Prog Retin Eye Res ; 90: 101034, 2022 09.
Article in English | MEDLINE | ID: mdl-34902546

ABSTRACT

An increasing number of artificial intelligence (AI) systems are being proposed in ophthalmology, motivated by the variety and amount of clinical and imaging data, as well as their potential benefits at the different stages of patient care. Despite achieving close or even superior performance to that of experts, there is a critical gap between development and integration of AI systems in ophthalmic practice. This work focuses on the importance of trustworthy AI to close that gap. We identify the main aspects or challenges that need to be considered along the AI design pipeline so as to generate systems that meet the requirements to be deemed trustworthy, including those concerning accuracy, resiliency, reliability, safety, and accountability. We elaborate on mechanisms and considerations to address those aspects or challenges, and define the roles and responsibilities of the different stakeholders involved in AI for ophthalmic care, i.e., AI developers, reading centers, healthcare providers, healthcare institutions, ophthalmological societies and working groups or committees, patients, regulatory bodies, and payers. Generating trustworthy AI is not a responsibility of a sole stakeholder. There is an impending necessity for a collaborative approach where the different stakeholders are represented along the AI design pipeline, from the definition of the intended use to post-market surveillance after regulatory approval. This work contributes to establish such multi-stakeholder interaction and the main action points to be taken so that the potential benefits of AI reach real-world ophthalmic settings.


Subject(s)
Artificial Intelligence , Ophthalmology , Delivery of Health Care , Humans , Reproducibility of Results
19.
Transl Vis Sci Technol ; 11(12): 3, 2022 12 01.
Article in English | MEDLINE | ID: mdl-36458946

ABSTRACT

Purpose: The purpose of this study was to develop and validate a deep learning (DL) framework for the detection and quantification of reticular pseudodrusen (RPD) and drusen on optical coherence tomography (OCT) scans. Methods: A DL framework was developed consisting of a classification model and an out-of-distribution (OOD) detection model for the identification of ungradable scans; a classification model to identify scans with drusen or RPD; and an image segmentation model to independently segment lesions as RPD or drusen. Data were obtained from 1284 participants in the UK Biobank (UKBB) with a self-reported diagnosis of age-related macular degeneration (AMD) and 250 UKBB controls. Drusen and RPD were manually delineated by five retina specialists. The main outcome measures were sensitivity, specificity, area under the receiver operating characteristic (ROC) curve (AUC), kappa, accuracy, intraclass correlation coefficient (ICC), and free-response receiver operating characteristic (FROC) curves. Results: The classification models performed strongly at their respective tasks (0.95, 0.93, and 0.99 AUC, respectively, for the ungradable scans classifier, the OOD model, and the drusen and RPD classification models). The mean ICC for the drusen and RPD area versus graders was 0.74 and 0.61, respectively, compared with 0.69 and 0.68 for intergrader agreement. FROC curves showed that the model's sensitivity was close to human performance. Conclusions: The models achieved high classification and segmentation performance, similar to human performance. Translational Relevance: Application of this robust framework will further our understanding of RPD as a separate entity from drusen in both research and clinical settings.


Subject(s)
Deep Learning , Macular Degeneration , Retinal Drusen , Humans , Tomography, Optical Coherence , Retinal Drusen/diagnostic imaging , Retina , Macular Degeneration/diagnostic imaging
20.
Med Image Anal ; 73: 102141, 2021 10.
Article in English | MEDLINE | ID: mdl-34246850

ABSTRACT

Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples that are then transferred to the target model. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (pre-training on ImageNet or random initialization) on the transferability of adversarial attacks from the surrogate model to the target model, i.e., how effective attacks crafted using the surrogate model are on the target model. Secondly, we study the influence of differences in development (training and validation) data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks. Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice. We recommend avoiding using only standard components, such as pre-trained architectures and publicly available datasets, as well as disclosure of design specifications, in addition to using adversarial defense methods. When evaluating the vulnerability of MedIA systems to adversarial attacks, various attack scenarios and target-surrogate differences should be simulated to achieve realistic robustness estimates. The code and all trained models used in our experiments are publicly available.3.


Subject(s)
Machine Learning , Neural Networks, Computer , Humans
SELECTION OF CITATIONS
SEARCH DETAIL