Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 34
Filter
1.
JAMA Ophthalmol ; 142(4): 327-335, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38451496

ABSTRACT

Importance: Retinopathy of prematurity (ROP) is a leading cause of blindness in children, with significant disparities in outcomes between high-income and low-income countries, due in part to insufficient access to ROP screening. Objective: To evaluate how well autonomous artificial intelligence (AI)-based ROP screening can detect more-than-mild ROP (mtmROP) and type 1 ROP. Design, Setting, and Participants: This diagnostic study evaluated the performance of an AI algorithm, trained and calibrated using 2530 examinations from 843 infants in the Imaging and Informatics in Retinopathy of Prematurity (i-ROP) study, on 2 external datasets (6245 examinations from 1545 infants in the Stanford University Network for Diagnosis of ROP [SUNDROP] and 5635 examinations from 2699 infants in the Aravind Eye Care Systems [AECS] telemedicine programs). Data were taken from 11 and 48 neonatal care units in the US and India, respectively. Data were collected from January 2012 to July 2021, and data were analyzed from July to December 2023. Exposures: An imaging processing pipeline was created using deep learning to autonomously identify mtmROP and type 1 ROP in eye examinations performed via telemedicine. Main Outcomes and Measures: The area under the receiver operating characteristics curve (AUROC) as well as sensitivity and specificity for detection of mtmROP and type 1 ROP at the eye examination and patient levels. Results: The prevalence of mtmROP and type 1 ROP were 5.9% (91 of 1545) and 1.2% (18 of 1545), respectively, in the SUNDROP dataset and 6.2% (168 of 2699) and 2.5% (68 of 2699) in the AECS dataset. Examination-level AUROCs for mtmROP and type 1 ROP were 0.896 and 0.985, respectively, in the SUNDROP dataset and 0.920 and 0.982 in the AECS dataset. At the cross-sectional examination level, mtmROP detection had high sensitivity (SUNDROP: mtmROP, 83.5%; 95% CI, 76.6-87.7; type 1 ROP, 82.2%; 95% CI, 81.2-83.1; AECS: mtmROP, 80.8%; 95% CI, 76.2-84.9; type 1 ROP, 87.8%; 95% CI, 86.8-88.7). At the patient level, all infants who developed type 1 ROP screened positive (SUNDROP: 100%; 95% CI, 81.4-100; AECS: 100%; 95% CI, 94.7-100) prior to diagnosis. Conclusions and Relevance: Where and when ROP telemedicine programs can be implemented, autonomous ROP screening may be an effective force multiplier for secondary prevention of ROP.


Subject(s)
Retinopathy of Prematurity , Infant, Newborn , Infant , Child , Humans , Retinopathy of Prematurity/diagnosis , Artificial Intelligence , Cross-Sectional Studies , Gestational Age , Infant, Premature
2.
Ophthalmol Sci ; 4(2): 100417, 2024.
Article in English | MEDLINE | ID: mdl-38059124

ABSTRACT

Purpose: Retinopathy of prematurity (ROP) is one of the leading causes of blindness in children. Although the role of oxygen in the pathophysiology of ROP is well established, a precise understanding of the dynamic relationship between oxygen exposure ROP incidence and severity is lacking. The purpose of this study was to evaluate the correlation between time-dependent oxygen variables and the onset of ROP. Design: Retrospective cohort study. Participants: Two hundred thirty infants who were born at a single academic center and met the inclusion criteria were included. Infants are mainly born between January 2011 and October 2022. Methods: Patient data were extracted from electronic health records (EHRs), with sufficient time-dependent oxygen data. Clinical outcomes for ROP were recorded as none/mild or moderate/severe (defined as type II or worse). Mixed-effects linear models were used to compare the 2 groups in terms of dynamic oxygen variables, such as daily average and the coefficient of variation (COV) fraction of inspired oxygen (FiO2). Support vector machine (SVM) and long-short-term memory (LSTM)-based multimodal models were trained with fivefold cross-validation to predict which infants would develop moderate/severe ROP. Gestational age (GA), birth weight, and time-dependent oxygen variables were used to develop predictive models. Main Outcome Measures: Model cross-validation performance was evaluated by computing the mean area under the receiver operating characteristic (AUROC) curve, precision, recall, and F1 score. Results: We found that both daily average and COV of FiO2 were associated with more severe ROP (adjusted P < 0.001). With fivefold cross-validation, the multimodal LSTM models had higher performance than the best static models (SVM using GA and 3 average FiO2 features) and SVM models trained on GA alone (mean AUROC = 0.89 ± 0.04 vs. 0.86 ± 0.05 vs. 0.83 ± 0.04). Conclusions: The development of severe ROP might not only be influenced by oxygen exposure but also by its fluctuation, which provides direction for future study of pathophysiological factors associated with severe ROP development. Additionally, we demonstrated that multimodal neural networks can be a method to extract useful information from time-series data, which may be a valuable methodology for the investigation of other diseases using EHR data. Financial Disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

3.
Neuro Oncol ; 2023 Dec 09.
Article in English | MEDLINE | ID: mdl-38070147

ABSTRACT

BACKGROUND: We recently conducted a phase 2 trial (NCT028865685) evaluating intracranial efficacy of pembrolizumab for brain metastases (BM) of diverse histologies. Our study met its primary efficacy endpoint and illustrates that pembrolizumab exerts promising activity in a select group of patients with BM. Given the importance of aberrant vasculature in mediating immunosuppression, we explored the relationship between checkpoint inhibitor (ICI) efficacy and vascular architecture in the hopes of identifying potential mechanisms of intracranial ICI response or resistance for BM. METHODS: Using Vessel Architectural Imaging (VAI), a histologically validated quantitative metric for in vivo tumor vascular physiology, we analyzed dual echo DSC/DCE MRI for 44 patients on trial. Tumor and peri-tumor cerebral blood volume/flow, vessel size, arterial- and venous-dominance, and vascular permeability were measured before and after treatment with pembrolizumab. RESULTS: BM that progressed on ICI were characterized by a highly aberrant vasculature dominated by large-caliber vessels. In contrast, ICI-responsive BM possessed a more structurally balanced vasculature consisting of both small and large vessels, and there was a trend towards a decrease in under-perfused tissue, suggesting a reversal of the negative effects of hypoxia. In the peri-tumor region, development of smaller blood vessels, consistent with neo-angiogenesis, was associated with tumor growth before radiographic evidence of contrast enhancement on anatomical MRI. CONCLUSIONS: This study, one of the largest functional imaging studies for BM, suggests that vascular architecture is linked with ICI efficacy. Studies identifying modulators of vascular architecture, and effects on immune activity, are warranted and may inform future combination treatments.

4.
medRxiv ; 2023 Aug 23.
Article in English | MEDLINE | ID: mdl-37662422

ABSTRACT

Heritability of common eye diseases and ocular traits are relatively high. Here, we develop an automated algorithm to detect genetic relatedness from color fundus photographs (FPs). We estimated the degree of shared ancestry amongst individuals in the UK Biobank using KING software. A convolutional Siamese neural network-based algorithm was trained to output a measure of genetic relatedness using 7224 pairs (3612 related and 3612 unrelated) of FPs. The model achieved high performance for prediction of genetic relatedness; when computed Euclidean distances were used to determine probability of relatedness, the area under the receiver operating characteristic curve (AUROC) for identifying related FPs reached 0.926. We performed external validation of our model using FPs from the LIFE-Adult study and achieved an AUROC of 0.69. An occlusion map indicates that the optic nerve and its surrounding area may be the most predictive of genetic relatedness. We demonstrate that genetic relatedness can be captured from FP features. This approach may be used to uncover novel biomarkers for common ocular diseases.

5.
bioRxiv ; 2023 Aug 28.
Article in English | MEDLINE | ID: mdl-37693537

ABSTRACT

Structurally and functionally aberrant vasculature is a hallmark of tumor angiogenesis and treatment resistance. Given the synergistic link between aberrant tumor vasculature and immunosuppression, we analyzed perfusion MRI for 44 patients with brain metastases (BM) undergoing treatment with pembrolizumab. To date, vascular-immune communication, or the relationship between immune checkpoint inhibitor (ICI) efficacy and vascular architecture, has not been well-characterized in human imaging studies. We found that ICI-responsive BM possessed a structurally balanced vascular makeup, which was linked to improved vascular efficiency and an immune-stimulatory microenvironment. In contrast, ICI-resistant BM were characterized by a lack of immune cell infiltration and a highly aberrant vasculature dominated by large-caliber vessels. Peri-tumor region analysis revealed early functional changes predictive of ICI resistance before radiographic evidence on conventional MRI. This study was one of the largest functional imaging studies for BM and establishes a foundation for functional studies that illuminate the mechanisms linking patterns of vascular architecture with immunosuppression, as targeting these aspects of cancer biology may serve as the basis for future combination treatments.

6.
Clin Ophthalmol ; 17: 1525-1530, 2023.
Article in English | MEDLINE | ID: mdl-37284059

ABSTRACT

There has been a recent surge in the number of publications centered on the use of artificial intelligence (AI) to diagnose various systemic diseases. The Food and Drug Administration has approved several algorithms for use in clinical practice. In ophthalmology, most advances in AI relate to diabetic retinopathy, which is a disease process with agreed upon diagnostic and classification criteria. However, this is not the case for glaucoma, which is a relatively complex disease without agreed-upon diagnostic criteria. Moreover, currently available public datasets that focus on glaucoma have inconstant label quality, further complicating attempts at training AI algorithms efficiently. In this perspective paper, we discuss specific details related to developing AI models for glaucoma and suggest potential steps to overcome current limitations.

7.
JAMA Ophthalmol ; 141(6): 582-588, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37166816

ABSTRACT

Importance: Retinopathy of prematurity (ROP) telemedicine screening programs have been found to be effective, but they rely on widefield digital fundus imaging (WDFI) cameras, which are expensive, making them less accessible in low- to middle-income countries. Cheaper, smartphone-based fundus imaging (SBFI) systems have been described, but these have a narrower field of view (FOV) and have not been tested in a real-world, operational telemedicine setting. Objective: To assess the efficacy of SBFI systems compared with WDFI when used by technicians for ROP screening with both artificial intelligence (AI) and human graders. Design, Setting, and Participants: This prospective cross-sectional comparison study took place as a single-center ROP teleophthalmology program in India from January 2021 to April 2022. Premature infants who met normal ROP screening criteria and enrolled in the teleophthalmology screening program were included. Those who had already been treated for ROP were excluded. Exposures: All participants had WDFI images and from 1 of 2 SBFI devices, the Make-In-India (MII) Retcam or Keeler Monocular Indirect Ophthalmoscope (MIO) devices. Two masked readers evaluated zone, stage, plus, and vascular severity scores (VSS, from 1-9) in all images. Smartphone images were then stratified by patient into training (70%), validation (10%), and test (20%) data sets and used to train a ResNet18 deep learning architecture for binary classification of normal vs preplus or plus disease, which was then used for patient-level predictions of referral warranted (RW)- and treatment requiring (TR)-ROP. Main Outcome and Measures: Sensitivity and specificity of detection of RW-ROP, and TR-ROP by both human graders and an AI system and area under the receiver operating characteristic curve (AUC) of grader-assigned VSS. Sensitivity and specificity were compared between the 2 SBFI systems using Pearson χ2testing. Results: A total of 156 infants (312 eyes; mean [SD] gestational age, 33.0 [3.0] weeks; 75 [48%] female) were included with paired examinations. Sensitivity and specificity were not found to be statistically different between the 2 SBFI systems. Human graders were effective with SBFI at detecting TR-ROP with a sensitivity of 100% and specificity of 83.49%. The AUCs with grader-assigned VSS only were 0.95 (95% CI, 0.91-0.99) and 0.96 (95% CI, 0.93-0.99) for RW-ROP and TR-ROP, respectively. For the AI system, the sensitivity of detecting TR-ROP sensitivity was 100% with specificity of 58.6%, and RW-ROP sensitivity was 80.0% with specificity of 59.3%. Conclusions and Relevance: In this cross-sectional study, 2 different SBFI systems used by technicians in an ROP screening program were highly sensitive for TR-ROP. SBFI systems with AI may be a cost-effective method to improve the global capacity for ROP screening.


Subject(s)
Ophthalmology , Retinopathy of Prematurity , Telemedicine , Infant, Newborn , Infant , Humans , Female , Adult , Male , Cross-Sectional Studies , Retinopathy of Prematurity/diagnosis , Prospective Studies , Smartphone , Artificial Intelligence , Telemedicine/methods , Infant, Premature , Gestational Age , Sensitivity and Specificity , Ophthalmoscopy/methods
8.
JAMA Ophthalmol ; 141(6): 543-552, 2023 06 01.
Article in English | MEDLINE | ID: mdl-37140902

ABSTRACT

Importance: Although race is a social construct, it is associated with variations in skin and retinal pigmentation. Image-based medical artificial intelligence (AI) algorithms that use images of these organs have the potential to learn features associated with self-reported race (SRR), which increases the risk of racially biased performance in diagnostic tasks; understanding whether this information can be removed, without affecting the performance of AI algorithms, is critical in reducing the risk of racial bias in medical AI. Objective: To evaluate whether converting color fundus photographs to retinal vessel maps (RVMs) of infants screened for retinopathy of prematurity (ROP) removes the risk for racial bias. Design, Setting, and Participants: The retinal fundus images (RFIs) of neonates with parent-reported Black or White race were collected for this study. A u-net, a convolutional neural network (CNN) that provides precise segmentation for biomedical images, was used to segment the major arteries and veins in RFIs into grayscale RVMs, which were subsequently thresholded, binarized, and/or skeletonized. CNNs were trained with patients' SRR labels on color RFIs, raw RVMs, and thresholded, binarized, or skeletonized RVMs. Study data were analyzed from July 1 to September 28, 2021. Main Outcomes and Measures: Area under the precision-recall curve (AUC-PR) and area under the receiver operating characteristic curve (AUROC) at both the image and eye level for classification of SRR. Results: A total of 4095 RFIs were collected from 245 neonates with parent-reported Black (94 [38.4%]; mean [SD] age, 27.2 [2.3] weeks; 55 majority sex [58.5%]) or White (151 [61.6%]; mean [SD] age, 27.6 [2.3] weeks, 80 majority sex [53.0%]) race. CNNs inferred SRR from RFIs nearly perfectly (image-level AUC-PR, 0.999; 95% CI, 0.999-1.000; infant-level AUC-PR, 1.000; 95% CI, 0.999-1.000). Raw RVMs were nearly as informative as color RFIs (image-level AUC-PR, 0.938; 95% CI, 0.926-0.950; infant-level AUC-PR, 0.995; 95% CI, 0.992-0.998). Ultimately, CNNs were able to learn whether RFIs or RVMs were from Black or White infants regardless of whether images contained color, vessel segmentation brightness differences were nullified, or vessel segmentation widths were uniform. Conclusions and Relevance: Results of this diagnostic study suggest that it can be very challenging to remove information relevant to SRR from fundus photographs. As a result, AI algorithms trained on fundus photographs have the potential for biased performance in practice, even if based on biomarkers rather than raw images. Regardless of the methodology used for training AI, evaluating performance in relevant subpopulations is critical.


Subject(s)
Artificial Intelligence , Racism , Infant, Newborn , Infant , Humans , Adult , Retina , Neural Networks, Computer , Algorithms
9.
Ophthalmology ; 130(8): 837-843, 2023 08.
Article in English | MEDLINE | ID: mdl-37030453

ABSTRACT

PURPOSE: Epidemiological changes in retinopathy of prematurity (ROP) depend on neonatal care, neonatal mortality, and the ability to carefully titrate and monitor oxygen. We evaluate whether an artificial intelligence (AI) algorithm for assessing ROP severity in babies can be used to evaluate changes in disease epidemiology in babies from South India over a 5-year period. DESIGN: Retrospective cohort study. PARTICIPANTS: Babies (3093) screened for ROP at neonatal care units (NCUs) across the Aravind Eye Care System (AECS) in South India. METHODS: Images and clinical data were collected as part of routine tele-ROP screening at the AECS in India over 2 time periods: August 2015 to October 2017 and March 2019 to December 2020. All babies in the original cohort were matched 1:3 by birthweight (BW) and gestational age (GA) with babies in the later cohort. We compared the proportion of eyes with moderate (type 2) or treatment-requiring (TR) ROP, and an AI-derived ROP vascular severity score (from retinal fundus images) at the initial tele-retinal screening exam for all babies in a district, VSS), in the 2 time periods. MAIN OUTCOME MEASURES: Differences in the proportions of type 2 or worse and TR-ROP cases, and VSS between time periods. RESULTS: Among BW and GA matched babies, the proportion [95% confidence interval {CI}] of babies with type 2 or worse and TR-ROP decreased from 60.9% [53.8%-67.7%] to 17.1% [14.0%-20.5%] (P < 0.001) and 16.8% [11.9%-22.7%] to 5.1% [3.4%-7.3%] (P < 0.001), over the 2 time periods. Similarly, the median [interquartile range] VSS in the population decreased from 2.9 [1.2] to 2.4 [1.8] (P < 0.001). CONCLUSIONS: In South India, over a 5-year period, the proportion of babies developing moderate to severe ROP has dropped significantly for babies at similar demographic risk, strongly suggesting improvements in primary prevention of ROP. These results suggest that AI-based assessment of ROP severity may be a useful epidemiologic tool to evaluate temporal changes in ROP epidemiology. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found after the references.


Subject(s)
Retinopathy of Prematurity , Telemedicine , Infant, Newborn , Infant , Humans , Retinopathy of Prematurity/diagnosis , Retinopathy of Prematurity/epidemiology , Retrospective Studies , Artificial Intelligence , Risk Factors , Gestational Age , Birth Weight , Telemedicine/methods , Neonatal Screening/methods
10.
Radiographics ; 43(4): e220107, 2023 04.
Article in English | MEDLINE | ID: mdl-36862082

ABSTRACT

Deep learning (DL) algorithms have shown remarkable potential in automating various tasks in medical imaging and radiologic reporting. However, models trained on low quantities of data or only using data from a single institution often are not generalizable to other institutions, which may have different patient demographics or data acquisition characteristics. Therefore, training DL algorithms using data from multiple institutions is crucial to improving the robustness and generalizability of clinically useful DL models. In the context of medical data, simply pooling data from each institution to a central location to train a model poses several issues such as increased risk to patient privacy, increased costs for data storage and transfer, and regulatory challenges. These challenges of centrally hosting data have motivated the development of distributed machine learning techniques and frameworks for collaborative learning that facilitate the training of DL models without the need to explicitly share private medical data. The authors describe several popular methods for collaborative training and review the main considerations for deploying these models. They also highlight publicly available software frameworks for federated learning and showcase several real-world examples of collaborative learning. The authors conclude by discussing some key challenges and future research directions for distributed DL. They aim to introduce clinicians to the benefits, limitations, and risks of using distributed DL for the development of medical artificial intelligence algorithms. ©RSNA, 2023 Quiz questions for this article are available in the supplemental material.


Subject(s)
Deep Learning , Privacy , Humans , Artificial Intelligence , Algorithms , Machine Learning
11.
Radiology ; 307(1): e220715, 2023 04.
Article in English | MEDLINE | ID: mdl-36537895

ABSTRACT

Background Radiomics is the extraction of predefined mathematic features from medical images for the prediction of variables of clinical interest. While some studies report superlative accuracy of radiomic machine learning (ML) models, the published methodology is often incomplete, and the results are rarely validated in external testing data sets. Purpose To characterize the type, prevalence, and statistical impact of methodologic errors present in radiomic ML studies. Materials and Methods Radiomic ML publications were reviewed for the presence of performance-inflating methodologic flaws. Common flaws were subsequently reproduced with randomly generated features interpolated from publicly available radiomic data sets to demonstrate the precarious nature of reported findings. Results In an assessment of radiomic ML publications, the authors uncovered two general categories of data analysis errors: inconsistent partitioning and unproductive feature associations. In simulations, the authors demonstrated that inconsistent partitioning augments radiomic ML accuracy by 1.4 times from unbiased performance and that correcting for flawed methodologic results in areas under the receiver operating characteristic curve approaching a value of 0.5 (random chance). With use of randomly generated features, the authors illustrated that unproductive associations between radiomic features and gene sets can imply false causality for biologic phenomenon. Conclusion Radiomic machine learning studies may contain methodologic flaws that undermine their validity. This study provides a review template to avoid such flaws. © RSNA, 2022 Supplemental material is available for this article. See also the editorial by Jacobs in this issue.


Subject(s)
Machine Learning , Humans , ROC Curve , Retrospective Studies
12.
Ophthalmol Sci ; 2(4): 100165, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36531583

ABSTRACT

Purpose: To evaluate the performance of a deep learning (DL) algorithm for retinopathy of prematurity (ROP) screening in Nepal and Mongolia. Design: Retrospective analysis of prospectively collected clinical data. Participants: Clinical information and fundus images were obtained from infants in 2 ROP screening programs in Nepal and Mongolia. Methods: Fundus images were obtained using the Forus 3nethra neo (Forus Health) in Nepal and the RetCam Portable (Natus Medical, Inc.) in Mongolia. The overall severity of ROP was determined from the medical record using the International Classification of ROP (ICROP). The presence of plus disease was determined independently in each image using a reference standard diagnosis. The Imaging and Informatics for ROP (i-ROP) DL algorithm was trained on images from the RetCam to classify plus disease and to assign a vascular severity score (VSS) from 1 through 9. Main Outcome Measures: Area under the receiver operating characteristic curve and area under the precision-recall curve for the presence of plus disease or type 1 ROP and association between VSS and ICROP disease category. Results: The prevalence of type 1 ROP was found to be higher in Mongolia (14.0%) than in Nepal (2.2%; P < 0.001) in these data sets. In Mongolia (RetCam images), the area under the receiver operating characteristic curve for examination-level plus disease detection was 0.968, and the area under the precision-recall curve was 0.823. In Nepal (Forus images), these values were 0.999 and 0.993, respectively. The ROP VSS was associated with ICROP classification in both datasets (P < 0.001). At the population level, the median VSS was found to be higher in Mongolia (2.7; interquartile range [IQR], 1.3-5.4]) as compared with Nepal (1.9; IQR, 1.2-3.4; P < 0.001). Conclusions: These data provide preliminary evidence of the effectiveness of the i-ROP DL algorithm for ROP screening in neonatal populations in Nepal and Mongolia using multiple camera systems and are useful for consideration in future clinical implementation of artificial intelligence-based ROP screening in low- and middle-income countries.

13.
Ophthalmol Sci ; 2(2): 100126, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36249693

ABSTRACT

Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong's test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar's chi-square test and Cohen's κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P = 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.

14.
Medicine (Baltimore) ; 101(29): e29587, 2022 Jul 22.
Article in English | MEDLINE | ID: mdl-35866818

ABSTRACT

To tune and test the generalizability of a deep learning-based model for assessment of COVID-19 lung disease severity on chest radiographs (CXRs) from different patient populations. A published convolutional Siamese neural network-based model previously trained on hospitalized patients with COVID-19 was tuned using 250 outpatient CXRs. This model produces a quantitative measure of COVID-19 lung disease severity (pulmonary x-ray severity (PXS) score). The model was evaluated on CXRs from 4 test sets, including 3 from the United States (patients hospitalized at an academic medical center (N = 154), patients hospitalized at a community hospital (N = 113), and outpatients (N = 108)) and 1 from Brazil (patients at an academic medical center emergency department (N = 303)). Radiologists from both countries independently assigned reference standard CXR severity scores, which were correlated with the PXS scores as a measure of model performance (Pearson R). The Uniform Manifold Approximation and Projection (UMAP) technique was used to visualize the neural network results. Tuning the deep learning model with outpatient data showed high model performance in 2 United States hospitalized patient datasets (R = 0.88 and R = 0.90, compared to baseline R = 0.86). Model performance was similar, though slightly lower, when tested on the United States outpatient and Brazil emergency department datasets (R = 0.86 and R = 0.85, respectively). UMAP showed that the model learned disease severity information that generalized across test sets. A deep learning model that extracts a COVID-19 severity score on CXRs showed generalizable performance across multiple populations from 2 continents, including outpatients and hospitalized patients.


Subject(s)
COVID-19 , Deep Learning , COVID-19/diagnostic imaging , Humans , Lung , Radiography, Thoracic/methods , Radiologists
15.
JAMA Ophthalmol ; 140(8): 791-798, 2022 08 01.
Article in English | MEDLINE | ID: mdl-35797036

ABSTRACT

Importance: Retinopathy of prematurity (ROP) is a leading cause of preventable blindness that disproportionately affects children born in low- and middle-income countries (LMICs). In-person and telemedical screening examinations can reduce this risk but are challenging to implement in LMICs owing to the multitude of at-risk infants and lack of trained ophthalmologists. Objective: To implement an ROP risk model using retinal images from a single baseline examination to identify infants who will develop treatment-requiring (TR)-ROP in LMIC telemedicine programs. Design, Setting, and Participants: In this diagnostic study conducted from February 1, 2019, to June 30, 2021, retinal fundus images were collected from infants as part of an Indian ROP telemedicine screening program. An artificial intelligence (AI)-derived vascular severity score (VSS) was obtained from images from the first examination after 30 weeks' postmenstrual age. Using 5-fold cross-validation, logistic regression models were trained on 2 variables (gestational age and VSS) for prediction of TR-ROP. The model was externally validated on test data sets from India, Nepal, and Mongolia. Data were analyzed from October 20, 2021, to April 20, 2022. Main Outcomes and Measures: Primary outcome measures included sensitivity, specificity, positive predictive value, and negative predictive value for predictions of future occurrences of TR-ROP; the number of weeks before clinical diagnosis when a prediction was made; and the potential reduction in number of examinations required. Results: A total of 3760 infants (median [IQR] postmenstrual age, 37 [5] weeks; 1950 male infants [51.9%]) were included in the study. The diagnostic model had a sensitivity and specificity, respectively, for each of the data sets as follows: India, 100.0% (95% CI, 87.2%-100.0%) and 63.3% (95% CI, 59.7%-66.8%); Nepal, 100.0% (95% CI, 54.1%-100.0%) and 77.8% (95% CI, 72.9%-82.2%); and Mongolia, 100.0% (95% CI, 93.3%-100.0%) and 45.8% (95% CI, 39.7%-52.1%). With the AI model, infants with TR-ROP were identified a median (IQR) of 2.0 (0-11) weeks before TR-ROP diagnosis in India, 0.5 (0-2.0) weeks before TR-ROP diagnosis in Nepal, and 0 (0-5.0) weeks before TR-ROP diagnosis in Mongolia. If low-risk infants were never screened again, the population could be effectively screened with 45.0% (India, 664/1476), 38.4% (Nepal, 151/393), and 51.3% (Mongolia, 266/519) fewer examinations required. Conclusions and Relevance: Results of this diagnostic study suggest that there were 2 advantages to implementation of this risk model: (1) the number of examinations for low-risk infants could be reduced without missing cases of TR-ROP, and (2) high-risk infants could be identified and closely monitored before development of TR-ROP.


Subject(s)
Retinopathy of Prematurity , Adult , Artificial Intelligence , Child , Gestational Age , Humans , Infant , Infant, Newborn , Male , Neonatal Screening/methods , Retinopathy of Prematurity/diagnosis , Retinopathy of Prematurity/epidemiology , Retrospective Studies , Risk Factors , Sensitivity and Specificity
16.
IEEE J Biomed Health Inform ; 26(9): 4635-4644, 2022 09.
Article in English | MEDLINE | ID: mdl-35749336

ABSTRACT

Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base convolutional neural networks (CNNs) and generalized to various types of medical imaging tasks. The code is publicly available at https://github.com/zm17943/SplitAVG.


Subject(s)
Deep Learning , Diagnostic Imaging , Humans , Neural Networks, Computer , Radiography
17.
Ophthalmol Retina ; 6(8): 650-656, 2022 08.
Article in English | MEDLINE | ID: mdl-35304305

ABSTRACT

OBJECTIVE: To utilize a deep learning (DL) model trained via federated learning (FL), a method of collaborative training without sharing patient data, to delineate institutional differences in clinician diagnostic paradigms and disease epidemiology in retinopathy of prematurity (ROP). DESIGN: Evaluation of a diagnostic test or technology. SUBJECTS AND CONTROLS: We included 5245 patients with wide-angle retinal imaging from the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. Images were labeled with the clinical diagnoses of plus disease (plus, preplus, no plus), which were documented in the chart, and a reference standard diagnosis was determined by 3 image-based ROP graders and the clinical diagnosis. METHODS: Demographics (birth weight, gestational age) and clinical diagnoses for all eye examinations were recorded from each institution. Using an FL approach, a DL model for plus disease classification was trained using only the clinical labels. The 3 class probabilities were then converted into a vascular severity score (VSS) for each eye examination, as well as an "institutional VSS," in which the average of the VSS values assigned to patients' higher severity ("worse") eyes at each examination was calculated for each institution. MAIN OUTCOME MEASURES: We compared demographics, clinical diagnoses of plus disease, and institutional VSSs between institutions using the McNemar-Bowker test, 2-proportion Z test, and 1-way analysis of variance with post hoc analysis by the Tukey-Kramer test. Single regression analysis was performed to explore the relationship between demographics and VSSs. RESULTS: We found that the proportion of patients diagnosed with preplus disease varied significantly between institutions (P < 0.001). Using the DL-derived VSS trained on the data from all institutions using FL, we observed differences in the institutional VSS and the level of vascular severity diagnosed as no plus (P < 0.001) across institutions. A significant, inverse relationship between the institutional VSS and mean gestational age was found (P = 0.049, adjusted R2 = 0.49). CONCLUSIONS: A DL-derived ROP VSS developed without sharing data between institutions using FL identified differences in the clinical diagnoses of plus disease and overall levels of ROP severity between institutions. Federated learning may represent a method to standardize clinical diagnoses and provide objective measurements of disease for image-based diseases.


Subject(s)
Ophthalmology , Retinopathy of Prematurity , Gestational Age , Humans , Infant, Newborn , Reproducibility of Results , Retina , Retinopathy of Prematurity/diagnosis , Retinopathy of Prematurity/epidemiology
18.
Ophthalmol Retina ; 6(8): 657-663, 2022 08.
Article in English | MEDLINE | ID: mdl-35296449

ABSTRACT

OBJECTIVE: To compare the performance of deep learning classifiers for the diagnosis of plus disease in retinopathy of prematurity (ROP) trained using 2 methods for developing models on multi-institutional data sets: centralizing data versus federated learning (FL) in which no data leave each institution. DESIGN: Evaluation of a diagnostic test or technology. SUBJECTS: Deep learning models were trained, validated, and tested on 5255 wide-angle retinal images in the neonatal intensive care units of 7 institutions as part of the Imaging and Informatics in ROP study. All images were labeled for the presence of plus, preplus, or no plus disease with a clinical label and a reference standard diagnosis (RSD) determined by 3 image-based ROP graders and the clinical diagnosis. METHODS: We compared the area under the receiver operating characteristic curve (AUROC) for models developed on multi-institutional data, using a central approach initially, followed by FL, and compared locally trained models with both approaches. We compared the model performance (κ) with the label agreement (between clinical and RSD), data set size, and number of plus disease cases in each training cohort using the Spearman correlation coefficient (CC). MAIN OUTCOME MEASURES: Model performance using AUROC and linearly weighted κ. RESULTS: Four settings of experiment were used: FL trained on RSD against central trained on RSD, FL trained on clinical labels against central trained on clinical labels, FL trained on RSD against central trained on clinical labels, and FL trained on clinical labels against central trained on RSD (P = 0.046, P = 0.126, P = 0.224, and P = 0.0173, respectively). Four of the 7 (57%) models trained on local institutional data performed inferiorly to the FL models. The model performance for local models was positively correlated with the label agreement (between clinical and RSD labels, CC = 0.389, P = 0.387), total number of plus cases (CC = 0.759, P = 0.047), and overall training set size (CC = 0.924, P = 0.002). CONCLUSIONS: We found that a trained FL model performs comparably to a centralized model, confirming that FL may provide an effective, more feasible solution for interinstitutional learning. Smaller institutions benefit more from collaboration than larger institutions, showing the potential of FL for addressing disparities in resource access.


Subject(s)
Ophthalmology , Retinopathy of Prematurity , Diagnostic Imaging , Humans , Infant, Newborn , Ophthalmology/education , ROC Curve , Reproducibility of Results , Retinopathy of Prematurity/diagnosis
19.
Ophthalmology ; 129(7): e69-e76, 2022 07.
Article in English | MEDLINE | ID: mdl-35157950

ABSTRACT

PURPOSE: To validate a vascular severity score as an appropriate output for artificial intelligence (AI) Software as a Medical Device (SaMD) for retinopathy of prematurity (ROP) through comparison with ordinal disease severity labels for stage and plus disease assigned by the International Classification of Retinopathy of Prematurity, Third Edition (ICROP3), committee. DESIGN: Validation study of an AI-based ROP vascular severity score. PARTICIPANTS: A total of 34 ROP experts from the ICROP3 committee. METHODS: Two separate datasets of 30 fundus photographs each for stage (0-5) and plus disease (plus, preplus, neither) were labeled by members of the ICROP3 committee using an open-source platform. Averaging these results produced a continuous label for plus (1-9) and stage (1-3) for each image. Experts were also asked to compare each image to each other in terms of relative severity for plus disease. Each image was also labeled with a vascular severity score from the Imaging and Informatics in ROP deep learning system, which was compared with each grader's diagnostic labels for correlation, as well as the ophthalmoscopic diagnosis of stage. MAIN OUTCOME MEASURES: Weighted kappa and Pearson correlation coefficients (CCs) were calculated between each pair of grader classification labels for stage and plus disease. The Elo algorithm was also used to convert pairwise comparisons for each expert into an ordered set of images from least to most severe. RESULTS: The mean weighted kappa and CC for all interobserver pairs for plus disease image comparison were 0.67 and 0.88, respectively. The vascular severity score was found to be highly correlated with both the average plus disease classification (CC = 0.90, P < 0.001) and the ophthalmoscopic diagnosis of stage (P < 0.001 by analysis of variance) among all experts. CONCLUSIONS: The ROP vascular severity score correlates well with the International Classification of Retinopathy of Prematurity committee member's labels for plus disease and stage, which had significant intergrader variability. Generation of a consensus for a validated scoring system for ROP SaMD can facilitate global innovation and regulatory authorization of these technologies.


Subject(s)
Retinopathy of Prematurity , Artificial Intelligence , Diagnostic Imaging , Gestational Age , Humans , Infant, Newborn , Ophthalmoscopy/methods , Reproducibility of Results , Retinopathy of Prematurity/diagnosis
20.
Article in English | MEDLINE | ID: mdl-36745141

ABSTRACT

Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally - some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact SV computation is impossibly expensive, especially when there are hundreds of contributors. Existing SV computation techniques use approximations. However, in healthcare where the number of contributing institutions are likely not of a colossal scale, computing exact SVs is still exorbitantly expensive, but not impossible. For such settings, we propose an efficient SV computation technique called SaFE (Shapley Value for Federated Learning using Ensembling). We empirically show that SaFE computes values that are close to exact SVs, and that it performs better than current SV approximations. This is particularly relevant in medical imaging setting where widespread heterogeneity across institutions is rampant and fast accurate data valuation is required to determine the contribution of each participant in multi-institutional collaborative learning.

SELECTION OF CITATIONS
SEARCH DETAIL
...