Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 77
Filtrar
1.
Med Image Anal ; 99: 103330, 2024 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-39260033

RESUMEN

Twin-to-Twin Transfusion Syndrome (TTTS) is a rare condition that affects about 15% of monochorionic pregnancies, in which identical twins share a single placenta. Fetoscopic laser photocoagulation (FLP) is the standard treatment for TTTS, which significantly improves the survival of fetuses. The aim of FLP is to identify abnormal connections between blood vessels and to laser ablate them in order to equalize blood supply to both fetuses. However, performing fetoscopic surgery is challenging due to limited visibility, a narrow field of view, and significant variability among patients and domains. In order to enhance the visualization of placental vessels during surgery, we propose TTTSNet, a network architecture designed for real-time and accurate placental vessel segmentation. Our network architecture incorporates a novel channel attention module and multi-scale feature fusion module to precisely segment tiny placental vessels. To address the challenges posed by FLP-specific fiberscope and amniotic sac-based artifacts, we employed novel data augmentation techniques. These techniques simulate various artifacts, including laser pointer, amniotic sac particles, and structural and optical fiber artifacts. By incorporating these simulated artifacts during training, our network architecture demonstrated robust generalizability. We trained TTTSNet on a publicly available dataset of 2060 video frames from 18 independent fetoscopic procedures and evaluated it on a multi-center external dataset of 24 in-vivo procedures with a total of 2348 video frames. Our method achieved significant performance improvements compared to state-of-the-art methods, with a mean Intersection over Union of 78.26% for all placental vessels and 73.35% for a subset of tiny placental vessels. Moreover, our method achieved 172 and 152 frames per second on an A100 GPU, and Clara AGX, respectively. This potentially opens the door to real-time application during surgical procedures. The code is publicly available at https://github.com/SanoScience/TTTSNet.

2.
Transl Vis Sci Technol ; 13(9): 11, 2024 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-39235402

RESUMEN

Purpose: The purpose of this study was to develop a deep learning algorithm for detecting and quantifying incomplete retinal pigment epithelium and outer retinal atrophy (iRORA) and complete retinal pigment epithelium and outer retinal atrophy (cRORA) in optical coherence tomography (OCT) that generalizes well to data from different devices and to validate in an intermediate age-related macular degeneration (iAMD) cohort. Methods: The algorithm comprised a domain adaptation (DA) model, promoting generalization across devices, and a segmentation model for detecting granular biomarkers defining iRORA/cRORA, which are combined into iRORA/cRORA segmentations. Manual annotations of iRORA/cRORA in OCTs from different devices in the MACUSTAR study (168 patients with iAMD) were compared to the algorithm's output. Eye level classification metrics included sensitivity, specificity, and quadratic weighted Cohen's κ score (κw). Segmentation performance was assessed quantitatively using Bland-Altman plots and qualitatively. Results: For ZEISS OCTs, sensitivity and specificity for iRORA/cRORA classification were 38.5% and 93.1%, respectively, and 60.0% and 96.4% for cRORA. For Spectralis OCTs, these were 84.0% and 93.7% for iRORA/cRORA, and 62.5% and 97.4% for cRORA. The κw scores for 3-way classification (none, iRORA, and cRORA) were 0.37 and 0.73 for ZEISS and Spectralis, respectively. Removing DA reduced κw from 0.73 to 0.63 for Spectralis. Conclusions: The DA-enabled iRORA/cRORA segmentation algorithm showed superior consistency compared to human annotations, and good generalization across OCT devices. Translational Relevance: The application of this algorithm may help toward precise and automated tracking of iAMD-related lesion changes, which is crucial in clinical settings and multicenter longitudinal studies on iAMD.


Asunto(s)
Aprendizaje Profundo , Degeneración Macular , Epitelio Pigmentado de la Retina , Tomografía de Coherencia Óptica , Humanos , Tomografía de Coherencia Óptica/métodos , Epitelio Pigmentado de la Retina/patología , Epitelio Pigmentado de la Retina/diagnóstico por imagen , Femenino , Degeneración Macular/patología , Degeneración Macular/diagnóstico , Degeneración Macular/diagnóstico por imagen , Masculino , Anciano , Atrofia/patología , Algoritmos , Anciano de 80 o más Años
3.
Med Image Anal ; 97: 103259, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-38959721

RESUMEN

Deep learning classification models for medical image analysis often perform well on data from scanners that were used to acquire the training data. However, when these models are applied to data from different vendors, their performance tends to drop substantially. Artifacts that only occur within scans from specific scanners are major causes of this poor generalizability. We aimed to enhance the reliability of deep learning classification models using a novel method called Uncertainty-Based Instance eXclusion (UBIX). UBIX is an inference-time module that can be employed in multiple-instance learning (MIL) settings. MIL is a paradigm in which instances (generally crops or slices) of a bag (generally an image) contribute towards a bag-level output. Instead of assuming equal contribution of all instances to the bag-level output, UBIX detects instances corrupted due to local artifacts on-the-fly using uncertainty estimation, reducing or fully ignoring their contributions before MIL pooling. In our experiments, instances are 2D slices and bags are volumetric images, but alternative definitions are also possible. Although UBIX is generally applicable to diverse classification tasks, we focused on the staging of age-related macular degeneration in optical coherence tomography. Our models were trained on data from a single scanner and tested on external datasets from different vendors, which included vendor-specific artifacts. UBIX showed reliable behavior, with a slight decrease in performance (a decrease of the quadratic weighted kappa (κw) from 0.861 to 0.708), when applied to images from different vendors containing artifacts; while a state-of-the-art 3D neural network without UBIX suffered from a significant detriment of performance (κw from 0.852 to 0.084) on the same test set. We showed that instances with unseen artifacts can be identified with OOD detection. UBIX can reduce their contribution to the bag-level predictions, improving reliability without retraining on new data. This potentially increases the applicability of artificial intelligence models to data from other scanners than the ones for which they were developed. The source code for UBIX, including trained model weights, is publicly available through https://github.com/qurAI-amsterdam/ubix-for-reliable-classification.


Asunto(s)
Aprendizaje Profundo , Tomografía de Coherencia Óptica , Tomografía de Coherencia Óptica/métodos , Humanos , Incertidumbre , Reproducibilidad de los Resultados , Artefactos , Procesamiento de Imagen Asistido por Computador/métodos , Degeneración Macular/diagnóstico por imagen , Algoritmos
4.
Mod Pathol ; 37(8): 100531, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38830407

RESUMEN

Histopathological assessment of esophageal biopsies is a key part in the management of patients with Barrett esophagus (BE) but prone to observer variability and reliable diagnostic methods are needed. Artificial intelligence (AI) is emerging as a powerful tool for aided diagnosis but often relies on abstract test and validation sets while real-world behavior is unknown. In this study, we developed a 2-stage AI system for histopathological assessment of BE-related dysplasia using deep learning to enhance the efficiency and accuracy of the pathology workflow. The AI system was developed and trained on 290 whole-slide images (WSIs) that were annotated at glandular and tissue levels. The system was designed to identify individual glands, grade dysplasia, and assign a WSI-level diagnosis. The proposed method was evaluated by comparing the performance of our AI system with that of a large international and heterogeneous group of 55 gastrointestinal pathologists assessing 55 digitized biopsies spanning the complete spectrum of BE-related dysplasia. The AI system correctly graded 76.4% of the WSIs, surpassing the performance of 53 out of the 55 participating pathologists. Furthermore, the receiver-operating characteristic analysis showed that the system's ability to predict the absence (nondysplastic BE) versus the presence of any dysplasia was with an area under the curve of 0.94 and a sensitivity of 0.92 at a specificity of 0.94. These findings demonstrate that this AI system has the potential to assist pathologists in assessment of BE-related dysplasia. The system's outputs could provide a reliable and consistent secondary diagnosis in challenging cases or be used for triaging low-risk nondysplastic biopsies, thereby reducing the workload of pathologists and increasing throughput.


Asunto(s)
Adenocarcinoma , Esófago de Barrett , Aprendizaje Profundo , Neoplasias Esofágicas , Lesiones Precancerosas , Humanos , Neoplasias Esofágicas/patología , Adenocarcinoma/patología , Esófago de Barrett/patología , Lesiones Precancerosas/patología , Interpretación de Imagen Asistida por Computador , Biopsia
5.
IEEE Trans Med Imaging ; 43(8): 2839-2853, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38530714

RESUMEN

Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.


Asunto(s)
Algoritmos , Neoplasias Pulmonares , Interpretación de Imagen Radiográfica Asistida por Computador , Radiografía Torácica , Nódulo Pulmonar Solitario , Humanos , Neoplasias Pulmonares/diagnóstico por imagen , Radiografía Torácica/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Nódulo Pulmonar Solitario/diagnóstico por imagen , Pulmón/diagnóstico por imagen , Aprendizaje Profundo
6.
Magn Reson Imaging ; 107: 33-46, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38184093

RESUMEN

Acquiring fully-sampled MRI k-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or non-Cartesian trajectories can be implemented in MRI scanners as alternative subsampling options. This work investigates the impact of the k-space subsampling scheme on the quality of reconstructed accelerated MRI measurements produced by trained DL models. The Recurrent Variational Network (RecurrentVarNet) was used as the DL-based MRI-reconstruction architecture. Cartesian, fully-sampled multi-coil k-space measurements from three datasets were retrospectively subsampled with different accelerations using eight distinct subsampling schemes: four Cartesian-rectilinear, two Cartesian non-rectilinear, and two non-Cartesian. Experiments were conducted in two frameworks: scheme-specific, where a distinct model was trained and evaluated for each dataset-subsampling scheme pair, and multi-scheme, where for each dataset a single model was trained on data randomly subsampled by any of the eight schemes and evaluated on data subsampled by all schemes. In both frameworks, RecurrentVarNets trained and evaluated on non-rectilinearly subsampled data demonstrated superior performance, particularly for high accelerations. In the multi-scheme setting, reconstruction performance on rectilinearly subsampled data improved when compared to the scheme-specific experiments. Our findings demonstrate the potential for using DL-based methods, trained on non-rectilinearly subsampled measurements, to optimize scan time and image quality.


Asunto(s)
Algoritmos , Imagen por Resonancia Magnética , Estudios Retrospectivos , Imagen por Resonancia Magnética/métodos , Cintigrafía , Fantasmas de Imagen , Procesamiento de Imagen Asistido por Computador/métodos
7.
IEEE Trans Med Imaging ; 43(1): 542-557, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37713220

RESUMEN

The early detection of glaucoma is essential in preventing visual impairment. Artificial intelligence (AI) can be used to analyze color fundus photographs (CFPs) in a cost-effective manner, making glaucoma screening more accessible. While AI models for glaucoma screening from CFPs have shown promising results in laboratory settings, their performance decreases significantly in real-world scenarios due to the presence of out-of-distribution and low-quality images. To address this issue, we propose the Artificial Intelligence for Robust Glaucoma Screening (AIROGS) challenge. This challenge includes a large dataset of around 113,000 images from about 60,000 patients and 500 different screening centers, and encourages the development of algorithms that are robust to ungradable and unexpected input data. We evaluated solutions from 14 teams in this paper and found that the best teams performed similarly to a set of 20 expert ophthalmologists and optometrists. The highest-scoring team achieved an area under the receiver operating characteristic curve of 0.99 (95% CI: 0.98-0.99) for detecting ungradable images on-the-fly. Additionally, many of the algorithms showed robust performance when tested on three other publicly available datasets. These results demonstrate the feasibility of robust AI-enabled glaucoma screening.


Asunto(s)
Inteligencia Artificial , Glaucoma , Humanos , Glaucoma/diagnóstico por imagen , Fondo de Ojo , Técnicas de Diagnóstico Oftalmológico , Algoritmos
8.
Comput Biol Med ; 167: 107602, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37925906

RESUMEN

Accurate prediction of fetal weight at birth is essential for effective perinatal care, particularly in the context of antenatal management, which involves determining the timing and mode of delivery. The current standard of care involves performing a prenatal ultrasound 24 hours prior to delivery. However, this task presents challenges as it requires acquiring high-quality images, which becomes difficult during advanced pregnancy due to the lack of amniotic fluid. In this paper, we present a novel method that automatically predicts fetal birth weight by using fetal ultrasound video scans and clinical data. Our proposed method is based on a Transformer-based approach that combines a Residual Transformer Module with a Dynamic Affine Feature Map Transform. This method leverages tabular clinical data to evaluate 2D+t spatio-temporal features in fetal ultrasound video scans. Development and evaluation were carried out on a clinical set comprising 582 2D fetal ultrasound videos and clinical records of pregnancies from 194 patients performed less than 24 hours before delivery. Our results show that our method outperforms several state-of-the-art automatic methods and estimates fetal birth weight with an accuracy comparable to human experts. Hence, automatic measurements obtained by our method can reduce the risk of errors inherent in manual measurements. Observer studies suggest that our approach may be used as an aid for less experienced clinicians to predict fetal birth weight before delivery, optimizing perinatal care regardless of the available expertise.


Asunto(s)
Peso Fetal , Ultrasonografía Prenatal , Recién Nacido , Embarazo , Humanos , Femenino , Peso al Nacer , Ultrasonografía Prenatal/métodos , Biometría
9.
Am J Obstet Gynecol MFM ; 5(12): 101182, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37821009

RESUMEN

BACKGROUND: Fetal weight is currently estimated from fetal biometry parameters using heuristic mathematical formulas. Fetal biometry requires measurements of the fetal head, abdomen, and femur. However, this examination is prone to inter- and intraobserver variability because of factors, such as the experience of the operator, image quality, maternal characteristics, or fetal movements. Our study tested the hypothesis that a deep learning method can estimate fetal weight based on a video scan of the fetal abdomen and gestational age with similar performance to the full biometry-based estimations provided by clinical experts. OBJECTIVE: This study aimed to develop and test a deep learning method to automatically estimate fetal weight from fetal abdominal ultrasound video scans. STUDY DESIGN: A dataset of 900 routine fetal ultrasound examinations was used. Among those examinations, 800 retrospective ultrasound video scans of the fetal abdomen from 700 pregnant women between 15 6/7 and 41 0/7 weeks of gestation were used to train the deep learning model. After the training phase, the model was evaluated on an external prospectively acquired test set of 100 scans from 100 pregnant women between 16 2/7 and 38 0/7 weeks of gestation. The deep learning model was trained to directly estimate fetal weight from ultrasound video scans of the fetal abdomen. The deep learning estimations were compared with manual measurements on the test set made by 6 human readers with varying levels of expertise. Human readers used standard 3 measurements made on the standard planes of the head, abdomen, and femur and heuristic formula to estimate fetal weight. The Bland-Altman analysis, mean absolute percentage error, and intraclass correlation coefficient were used to evaluate the performance and robustness of the deep learning method and were compared with human readers. RESULTS: Bland-Altman analysis did not show systematic deviations between readers and deep learning. The mean and standard deviation of the mean absolute percentage error between 6 human readers and the deep learning approach was 3.75%±2.00%. Excluding junior readers (residents), the mean absolute percentage error between 4 experts and the deep learning approach was 2.59%±1.11%. The intraclass correlation coefficients reflected excellent reliability and varied between 0.9761 and 0.9865. CONCLUSION: This study reports the use of deep learning to estimate fetal weight using only ultrasound video of the fetal abdomen from fetal biometry scans. Our experiments demonstrated similar performance of human measurements and deep learning on prospectively acquired test data. Deep learning is a promising approach to directly estimate fetal weight using ultrasound video scans of the fetal abdomen.


Asunto(s)
Aprendizaje Profundo , Peso Fetal , Embarazo , Femenino , Humanos , Estudios Retrospectivos , Reproducibilidad de los Resultados , Abdomen/diagnóstico por imagen
10.
IEEE J Biomed Health Inform ; 27(11): 5483-5494, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37682646

RESUMEN

Retinal Optical Coherence Tomography (OCT) allows the non-invasive direct observation of the central nervous system, enabling the measurement and extraction of biomarkers from neural tissue that can be helpful in the assessment of ocular, systemic and Neurological Disorders (ND). Deep learning models can be trained to segment the retinal layers for biomarker extraction. However, the onset of ND can have an impact on the neural tissue, which can lead to the degraded performance of models not exposed to images displaying signs of disease during training. We present a fully automatic approach for the retinal layer segmentation in multiple neurodegenerative disorder scenarios, using an annotated dataset of patients of the most prevalent NDs: Alzheimer's disease, Parkinson's disease, multiple sclerosis and essential tremor, along with healthy control patients. Furthermore, we present a two-part, comprehensive study on the effects of ND on the performance of these models. The results show that images of healthy patients may not be sufficient for the robust training of automated segmentation models intended for the analysis of ND patients, and that using images representative of different NDs can increase the model performance. These results indicate that the presence or absence of patients of ND in datasets should be taken into account when training deep learning models for retinal layer segmentation, and that the proposed approach can provide a valuable tool for the robust and reliable diagnosis in multiple scenarios of ND.


Asunto(s)
Esclerosis Múltiple , Enfermedad de Parkinson , Humanos , Retina , Tomografía de Coherencia Óptica/métodos
11.
Ophthalmol Sci ; 3(3): 100300, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37113471

RESUMEN

Purpose: Significant visual impairment due to glaucoma is largely caused by the disease being detected too late. Objective: To build a labeled data set for training artificial intelligence (AI) algorithms for glaucoma screening by fundus photography, to assess the accuracy of the graders, and to characterize the features of all eyes with referable glaucoma (RG). Design: Cross-sectional study. Subjects: Color fundus photographs (CFPs) of 113 893 eyes of 60 357 individuals were obtained from EyePACS, California, United States, from a population screening program for diabetic retinopathy. Methods: Carefully selected graders (ophthalmologists and optometrists) graded the images. To qualify, they had to pass the European Optic Disc Assessment Trial optic disc assessment with ≥ 85% accuracy and 92% specificity. Of 90 candidates, 30 passed. Each image of the EyePACS set was then scored by varying random pairs of graders as "RG," "no referable glaucoma (NRG)," or "ungradable (UG)." In case of disagreement, a glaucoma specialist made the final grading. Referable glaucoma was scored if visual field damage was expected. In case of RG, graders were instructed to mark up to 10 relevant glaucomatous features. Main Outcome Measures: Qualitative features in eyes with RG. Results: The performance of each grader was monitored; if the sensitivity and specificity dropped below 80% and 95%, respectively (the final grade served as reference), they exited the study and their gradings were redone by other graders. In all, 20 graders qualified; their mean sensitivity and specificity (standard deviation [SD]) were 85.6% (5.7) and 96.1% (2.8), respectively. The 2 graders agreed in 92.45% of the images (Gwet's AC2, expressing the inter-rater reliability, was 0.917). Of all gradings, the sensitivity and specificity (95% confidence interval) were 86.0 (85.2-86.7)% and 96.4 (96.3-96.5)%, respectively. Of all gradable eyes (n = 111 183; 97.62%) the prevalence of RG was 4.38%. The most common features of RG were the appearance of the neuroretinal rim (NRR) inferiorly and superiorly. Conclusions: A large data set of CFPs was put together of sufficient quality to develop AI screening solutions for glaucoma. The most common features of RG were the appearance of the NRR inferiorly and superiorly. Disc hemorrhages were a rare feature of RG. Financial Disclosures: Proprietary or commercial disclosure may be found after the references.

12.
Transl Vis Sci Technol ; 11(12): 3, 2022 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-36458946

RESUMEN

Purpose: The purpose of this study was to develop and validate a deep learning (DL) framework for the detection and quantification of reticular pseudodrusen (RPD) and drusen on optical coherence tomography (OCT) scans. Methods: A DL framework was developed consisting of a classification model and an out-of-distribution (OOD) detection model for the identification of ungradable scans; a classification model to identify scans with drusen or RPD; and an image segmentation model to independently segment lesions as RPD or drusen. Data were obtained from 1284 participants in the UK Biobank (UKBB) with a self-reported diagnosis of age-related macular degeneration (AMD) and 250 UKBB controls. Drusen and RPD were manually delineated by five retina specialists. The main outcome measures were sensitivity, specificity, area under the receiver operating characteristic (ROC) curve (AUC), kappa, accuracy, intraclass correlation coefficient (ICC), and free-response receiver operating characteristic (FROC) curves. Results: The classification models performed strongly at their respective tasks (0.95, 0.93, and 0.99 AUC, respectively, for the ungradable scans classifier, the OOD model, and the drusen and RPD classification models). The mean ICC for the drusen and RPD area versus graders was 0.74 and 0.61, respectively, compared with 0.69 and 0.68 for intergrader agreement. FROC curves showed that the model's sensitivity was close to human performance. Conclusions: The models achieved high classification and segmentation performance, similar to human performance. Translational Relevance: Application of this robust framework will further our understanding of RPD as a separate entity from drusen in both research and clinical settings.


Asunto(s)
Aprendizaje Profundo , Degeneración Macular , Drusas Retinianas , Humanos , Tomografía de Coherencia Óptica , Drusas Retinianas/diagnóstico por imagen , Retina , Degeneración Macular/diagnóstico por imagen
13.
IEEE Trans Artif Intell ; 3(2): 129-138, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-35582210

RESUMEN

Amidst the ongoing pandemic, the assessment of computed tomography (CT) images for COVID-19 presence can exceed the workload capacity of radiologists. Several studies addressed this issue by automating COVID-19 classification and grading from CT images with convolutional neural networks (CNNs). Many of these studies reported initial results of algorithms that were assembled from commonly used components. However, the choice of the components of these algorithms was often pragmatic rather than systematic and systems were not compared to each other across papers in a fair manner. We systematically investigated the effectiveness of using 3-D CNNs instead of 2-D CNNs for seven commonly used architectures, including DenseNet, Inception, and ResNet variants. For the architecture that performed best, we furthermore investigated the effect of initializing the network with pretrained weights, providing automatically computed lesion maps as additional network input, and predicting a continuous instead of a categorical output. A 3-D DenseNet-201 with these components achieved an area under the receiver operating characteristic curve of 0.930 on our test set of 105 CT scans and an AUC of 0.919 on a publicly available set of 742 CT scans, a substantial improvement in comparison with a previously published 2-D CNN. This article provides insights into the performance benefits of various components for COVID-19 classification and grading systems. We have created a challenge on grand-challenge.org to allow for a fair comparison between the results of this and future research.

14.
Prog Retin Eye Res ; 90: 101034, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-34902546

RESUMEN

An increasing number of artificial intelligence (AI) systems are being proposed in ophthalmology, motivated by the variety and amount of clinical and imaging data, as well as their potential benefits at the different stages of patient care. Despite achieving close or even superior performance to that of experts, there is a critical gap between development and integration of AI systems in ophthalmic practice. This work focuses on the importance of trustworthy AI to close that gap. We identify the main aspects or challenges that need to be considered along the AI design pipeline so as to generate systems that meet the requirements to be deemed trustworthy, including those concerning accuracy, resiliency, reliability, safety, and accountability. We elaborate on mechanisms and considerations to address those aspects or challenges, and define the roles and responsibilities of the different stakeholders involved in AI for ophthalmic care, i.e., AI developers, reading centers, healthcare providers, healthcare institutions, ophthalmological societies and working groups or committees, patients, regulatory bodies, and payers. Generating trustworthy AI is not a responsibility of a sole stakeholder. There is an impending necessity for a collaborative approach where the different stakeholders are represented along the AI design pipeline, from the definition of the intended use to post-market surveillance after regulatory approval. This work contributes to establish such multi-stakeholder interaction and the main action points to be taken so that the potential benefits of AI reach real-world ophthalmic settings.


Asunto(s)
Inteligencia Artificial , Oftalmología , Atención a la Salud , Humanos , Reproducibilidad de los Resultados
15.
Med Image Anal ; 73: 102141, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34246850

RESUMEN

Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure. In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples that are then transferred to the target model. We consider this to be the most realistic scenario for MedIA systems. Firstly, we study the effect of weight initialization (pre-training on ImageNet or random initialization) on the transferability of adversarial attacks from the surrogate model to the target model, i.e., how effective attacks crafted using the surrogate model are on the target model. Secondly, we study the influence of differences in development (training and validation) data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks. Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice. We recommend avoiding using only standard components, such as pre-trained architectures and publicly available datasets, as well as disclosure of design specifications, in addition to using adversarial defense methods. When evaluating the vulnerability of MedIA systems to adversarial attacks, various attack scenarios and target-surrogate differences should be simulated to achieve realistic robustness estimates. The code and all trained models used in our experiments are publicly available.3.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Humanos
16.
JAMA Ophthalmol ; 139(7): 743-750, 2021 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-34014262

RESUMEN

IMPORTANCE: Treatments for geographic atrophy (GA), a late stage of age-related macular degeneration (AMD), are currently under development. Understanding the natural course is needed for optimal trial design. Although enlargement rates of GA and visual acuity (VA) in the short term are known from clinical studies, knowledge of enlargement in the long term, life expectancy, and visual course is lacking. OBJECTIVE: To determine long-term enlargement of GA. DESIGN, SETTING, AND PARTICIPANTS: In this study, participant data were collected from 4 population-based cohort studies, with up to 25 years of follow-up and eye examinations at 5-year intervals: the Rotterdam Study cohorts 1, 2, and 3 and the Blue Mountains Eye Study. Data were collected from 1990 to 2015, and data were analyzed from January 2019 to November 2020. MAIN OUTCOMES AND MEASURES: Area of GA was measured pixel by pixel using all available imaging. Area enlargement and enlargement of the square root-transformed area, time until GA reached the central fovea, and time until death were assessed, and best-corrected VA, smoking status, macular lesions according to the Three Continent AMD Consortium classification, a modified version of the Wisconsin age-related maculopathy grading system, and AMD genetic variants were covariates in Spearman, Pearson, or Mann-Whitney analyses. RESULTS: Of 171 included patients, 106 (62.0%) were female, and the mean (SD) age at inclusion was 82.6 (7.1) years. A total of 147 of 242 eyes with GA (60.7%) were newly diagnosed in our study. The mean area of GA at first presentation was 3.74 mm2 (95% CI, 3.11-4.67). Enlargement rate varied widely between persons (0.02 to 4.05 mm2 per year), with a mean of 1.09 mm2 per year (95% CI, 0.89-1.30). Stage of AMD in the other eye was correlated with GA enlargement (Spearman ρ = 0.34; P = .01). Foveal involvement was already present in incident GA in 55 of 147 eyes (37.4%); 23 of 42 eyes (55%) developed this after a mean (range) period of 5.6 (3-12) years, and foveal involvement did not develop before death in 11 of 42 eyes (26%). After first diagnosis, 121 of 171 patients with GA (70.8%) died after a mean (SD) period of 6.4 (5.4) years. Visual function was visually impaired (less than 20/63) in 47 of 107 patients (43.9%) at last visit before death. CONCLUSIONS AND RELEVANCE: In this study, enlargement of GA appeared to be highly variable in the general population. More than one-third of incident GA was foveal at first presentation; those with extrafoveal GA developed foveal GA after a mean of 5.6 years. Future intervention trials should focus on recruiting those patients who have a high chance of severe visual decline within their life expectancy.


Asunto(s)
Atrofia Geográfica , Degeneración Macular , Muerte , Femenino , Angiografía con Fluoresceína , Atrofia Geográfica/diagnóstico , Humanos , Degeneración Macular/diagnóstico , Masculino , Estudios Prospectivos , Agudeza Visual
17.
Am J Ophthalmol ; 226: 1-12, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33422464

RESUMEN

PURPOSE: We sought to develop and validate a deep learning model for segmentation of 13 features associated with neovascular and atrophic age-related macular degeneration (AMD). DESIGN: Development and validation of a deep-learning model for feature segmentation. METHODS: Data for model development were obtained from 307 optical coherence tomography volumes. Eight experienced graders manually delineated all abnormalities in 2712 B-scans. A deep neural network was trained with these data to perform voxel-level segmentation of the 13 most common abnormalities (features). For evaluation, 112 B-scans from 112 patients with a diagnosis of neovascular AMD were annotated by 4 independent observers. The main outcome measures were Dice score, intraclass correlation coefficient, and free-response receiver operating characteristic curve. RESULTS: On 11 of 13 features, the model obtained a mean Dice score of 0.63 ± 0.15, compared with 0.61 ± 0.17 for the observers. The mean intraclass correlation coefficient for the model was 0.66 ± 0.22, compared with 0.62 ± 0.21 for the observers. Two features were not evaluated quantitatively because of a lack of data. Free-response receiver operating characteristic analysis demonstrated that the model scored similar or higher sensitivity per false positives compared with the observers. CONCLUSIONS: The quality of the automatic segmentation matches that of experienced graders for most features, exceeding human performance for some features. The quantified parameters provided by the model can be used in the current clinical routine and open possibilities for further research into treatment response outside clinical trials.


Asunto(s)
Neovascularización Coroidal/diagnóstico por imagen , Aprendizaje Profundo , Atrofia Geográfica/diagnóstico por imagen , Drusas Retinianas/diagnóstico por imagen , Degeneración Macular Húmeda/diagnóstico por imagen , Anciano , Anciano de 80 o más Años , Inhibidores de la Angiogénesis/uso terapéutico , Neovascularización Coroidal/tratamiento farmacológico , Neovascularización Coroidal/fisiopatología , Femenino , Atrofia Geográfica/tratamiento farmacológico , Atrofia Geográfica/fisiopatología , Humanos , Inyecciones Intravítreas , Masculino , Persona de Mediana Edad , Modelos Estadísticos , Redes Neurales de la Computación , Curva ROC , Ranibizumab/uso terapéutico , Receptores de Factores de Crecimiento Endotelial Vascular/uso terapéutico , Proteínas Recombinantes de Fusión/uso terapéutico , Drusas Retinianas/tratamiento farmacológico , Drusas Retinianas/fisiopatología , Sensibilidad y Especificidad , Tomografía de Coherencia Óptica , Factor A de Crecimiento Endotelial Vascular/antagonistas & inhibidores , Agudeza Visual/fisiología , Degeneración Macular Húmeda/tratamiento farmacológico , Degeneración Macular Húmeda/fisiopatología
18.
Radiology ; 298(1): E18-E28, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-32729810

RESUMEN

Background The coronavirus disease 2019 (COVID-19) pandemic has spread across the globe with alarming speed, morbidity, and mortality. Immediate triage of patients with chest infections suspected to be caused by COVID-19 using chest CT may be of assistance when results from definitive viral testing are delayed. Purpose To develop and validate an artificial intelligence (AI) system to score the likelihood and extent of pulmonary COVID-19 on chest CT scans using the COVID-19 Reporting and Data System (CO-RADS) and CT severity scoring systems. Materials and Methods The CO-RADS AI system consists of three deep-learning algorithms that automatically segment the five pulmonary lobes, assign a CO-RADS score for the suspicion of COVID-19, and assign a CT severity score for the degree of parenchymal involvement per lobe. This study retrospectively included patients who underwent a nonenhanced chest CT examination because of clinical suspicion of COVID-19 at two medical centers. The system was trained, validated, and tested with data from one of the centers. Data from the second center served as an external test set. Diagnostic performance and agreement with scores assigned by eight independent observers were measured using receiver operating characteristic analysis, linearly weighted κ values, and classification accuracy. Results A total of 105 patients (mean age, 62 years ± 16 [standard deviation]; 61 men) and 262 patients (mean age, 64 years ± 16; 154 men) were evaluated in the internal and external test sets, respectively. The system discriminated between patients with COVID-19 and those without COVID-19, with areas under the receiver operating characteristic curve of 0.95 (95% CI: 0.91, 0.98) and 0.88 (95% CI: 0.84, 0.93), for the internal and external test sets, respectively. Agreement with the eight human observers was moderate to substantial, with mean linearly weighted κ values of 0.60 ± 0.01 for CO-RADS scores and 0.54 ± 0.01 for CT severity scores. Conclusion With high diagnostic performance, the CO-RADS AI system correctly identified patients with COVID-19 using chest CT scans and assigned standardized CO-RADS and CT severity scores that demonstrated good agreement with findings from eight independent observers and generalized well to external data. © RSNA, 2020 Supplemental material is available for this article.


Asunto(s)
Inteligencia Artificial , COVID-19/diagnóstico por imagen , Índice de Severidad de la Enfermedad , Tórax/diagnóstico por imagen , Tomografía Computarizada por Rayos X , Anciano , Sistemas de Datos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Proyectos de Investigación , Estudios Retrospectivos
19.
Ned Tijdschr Geneeskd ; 1642020 09 17.
Artículo en Holandés | MEDLINE | ID: mdl-33331711

RESUMEN

Technological developments in ophthalmic imaging and artificial intelligence (AI) create new possibilities for diagnostics in eye care. AI has already been applied in ophthalmic diabetes care. AI-systems currently detect diabetic retinopathy in general practice with a high sensitivity and specificity. AI-systems for the screening, monitoring and treatment of age-related macular degeneration and glaucoma are promising and are still being developed. AI-algorithms, however, only perform tasks for which they have been specifically trained and highly depend on the data and reference-standard that were used to train the system in identifying a certain abnormality or disease. How the data and the gold standard were established and determined, influences the performance of the algorithm. Furthermore, interpretability of deep learning algorithms is still an ongoing issue. By highlighting on images the areas that were critical for the decision of the algorithm, users can gain more insight into how algorithms come to a particular result.


Asunto(s)
Inteligencia Artificial , Retinopatía Diabética/diagnóstico , Glaucoma/diagnóstico , Degeneración Macular/diagnóstico , Tamizaje Masivo/métodos , Algoritmos , Diagnóstico por Imagen , Medicina General , Humanos , Sensibilidad y Especificidad
20.
IEEE Trans Med Imaging ; 39(11): 3499-3511, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32746093

RESUMEN

Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts' trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnormalities based on the prediction of a classifier trained only with image-level labels. For each image, initial visual evidence of the prediction is extracted with a given visual attribution technique. This provides localization of abnormalities that are then removed through selective inpainting. We iteratively apply this procedure until the system considers the image as normal. This yields augmented visual evidence, including less discriminative lesions which were not detected at first but should be considered for final diagnosis. We apply the method to grading of two retinal diseases in color fundus images: diabetic retinopathy (DR) and age-related macular degeneration (AMD). We evaluate the generated visual evidence and the performance of weakly-supervised localization of different types of DR and AMD abnormalities, both qualitatively and quantitatively. We show that the augmented visual evidence of the predictions highlights the biomarkers considered by experts for diagnosis and improves the final localization performance. It results in a relative increase of 11.2± 2.0% per image regarding sensitivity averaged at 10 false positives/image on average, when applied to different classification tasks, visual attribution techniques and network architectures. This makes the proposed method a useful tool for exhaustive visual support of DL classifiers in medical imaging.


Asunto(s)
Retinopatía Diabética , Degeneración Macular , Enfermedades de la Retina , Algoritmos , Retinopatía Diabética/diagnóstico por imagen , Fondo de Ojo , Humanos , Degeneración Macular/diagnóstico por imagen
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...