Pesquisa | Biblioteca Virtual em Saúde

Impact of Gold-Standard Label Errors on Evaluating Performance of Deep Learning Models in Diabetic Retinopathy Screening: Nationwide Real-World Validation Study.

Wang, Yueye; Han, Xiaotong; Li, Cong; Luo, Lixia; Yin, Qiuxia; Zhang, Jian; Peng, Guankai; Shi, Danli; He, Mingguang.

J Med Internet Res ; 26: e52506, 2024 Aug 14.

Artigo em Inglês | MEDLINE | ID: mdl-39141915

RESUMO

BACKGROUND: For medical artificial intelligence (AI) training and validation, human expert labels are considered the gold standard that represents the correct answers or desired outputs for a given data set. These labels serve as a reference or benchmark against which the model's predictions are compared. OBJECTIVE: This study aimed to assess the accuracy of a custom deep learning (DL) algorithm on classifying diabetic retinopathy (DR) and further demonstrate how label errors may contribute to this assessment in a nationwide DR-screening program. METHODS: Fundus photographs from the Lifeline Express, a nationwide DR-screening program, were analyzed to identify the presence of referable DR using both (1) manual grading by National Health Service England-certificated graders and (2) a DL-based DR-screening algorithm with validated good lab performance. To assess the accuracy of labels, a random sample of images with disagreement between the DL algorithm and the labels was adjudicated by ophthalmologists who were masked to the previous grading results. The error rates of labels in this sample were then used to correct the number of negative and positive cases in the entire data set, serving as postcorrection labels. The DL algorithm's performance was evaluated against both pre- and postcorrection labels. RESULTS: The analysis included 736,083 images from 237,824 participants. The DL algorithm exhibited a gap between the real-world performance and the lab-reported performance in this nationwide data set, with a sensitivity increase of 12.5% (from 79.6% to 92.5%, P<.001) and a specificity increase of 6.9% (from 91.6% to 98.5%, P<.001). In the random sample, 63.6% (560/880) of negative images and 5.2% (140/2710) of positive images were misclassified in the precorrection human labels. High myopia was the primary reason for misclassifying non-DR images as referable DR images, while laser spots were predominantly responsible for misclassified referable cases. The estimated label error rate for the entire data set was 1.2%. The label correction was estimated to bring about a 12.5% enhancement in the estimated sensitivity of the DL algorithm (P<.001). CONCLUSIONS: Label errors based on human image grading, although in a small percentage, can significantly affect the performance evaluation of DL algorithms in real-world DR screening.

Assuntos

Aprendizado Profundo , Retinopatia Diabética , Retinopatia Diabética/diagnóstico , Humanos , Algoritmos , Programas de Rastreamento/métodos , Programas de Rastreamento/normas , Feminino , Masculino , Pessoa de Meia-Idade

Deep neural network-estimated age using optical coherence tomography predicts mortality.

Chen, Ruiye; Zhang, Shiran; Peng, Guankai; Meng, Wei; Borchert, Grace; Wang, Wei; Yu, Zhen; Liao, Huan; Ge, Zongyuan; He, Mingguang; Zhu, Zhuoting.

Geroscience ; 46(2): 1703-1711, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-37733221

RESUMO

The concept of biological age has emerged as a measurement that reflects physiological and functional decline with ageing. Here we aimed to develop a deep neural network (DNN) model that predicts biological age from optical coherence tomography (OCT). A total of 84,753 high-quality OCT images from 53,159 individuals in the UK Biobank were included, among which 12,631 3D-OCT images from 8,541 participants without any reported medical conditions at baseline were used to develop an age prediction model. For the remaining 44,618 participants, OCT age gap, the difference between the OCT-predicted age and chronological age, was calculated for each participant. Cox regression models assessed the association between OCT age gap and mortality. The DNN model predicted age with a mean absolute error of 3.27 years and showed a strong correlation of 0.85 with chronological age. After a median follow-up of 11.0 years (IQR 10.9-11.1 years), 2,429 deaths (5.44%) were recorded. For each 5-year increase in OCT age gap, there was an 8% increased mortality risk (hazard ratio [HR] = 1.08, CI:1.02-1.13, P = 0.004). Compared with an OCT age gap within ± 4 years, OCT age gap less than minus 4 years was associated with a 16% decreased mortality risk (HR = 0.84, CI: 0.75-0.94, P = 0.002) and OCT age gap more than 4 years showed an 18% increased risk of death incidence (HR = 1.18, CI: 1.02-1.37, P = 0.026). OCT imaging could serve as an ageing biomarker to predict biological age with high accuracy and the OCT age gap, defined as the difference between the OCT-predicted age and chronological age, can be used as a marker of the risk of mortality.

Assuntos

Redes Neurais de Computação , Tomografia de Coerência Óptica , Humanos , Tomografia de Coerência Óptica/métodos , Biobanco do Reino Unido

A Deep Learning-Based Fully Automated Program for Choroidal Structure Analysis Within the Region of Interest in Myopic Children.

Xuan, Meng; Wang, Wei; Shi, Danli; Tong, James; Zhu, Zhuoting; Jiang, Yu; Ge, Zongyuan; Zhang, Jian; Bulloch, Gabriella; Peng, Guankai; Meng, Wei; Li, Cong; Xiong, Ruilin; Yuan, Yixiong; He, Mingguang.

Transl Vis Sci Technol ; 12(3): 22, 2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36947047

RESUMO

Purpose: To develop and validate a fully automated program for choroidal structure analysis within a 1500-µm-wide region of interest centered on the fovea (deep learning-based choroidal structure assessment program [DCAP]). Methods: A total of 2162 fovea-centered radial swept-source optical coherence tomography (SS-OCT) B-scans from 162 myopic children with cycloplegic spherical equivalent refraction ranging from -1.00 to -5.00 diopters were collected to develop the DCAP. Medical Transformer network and Small Attention U-Net were used to automatically segment the choroid boundaries and the nulla (the deepest point within the fovea). Automatic denoising based on choroidal vessel luminance and binarization were applied to isolate choroidal luminal/stromal areas. To further compare the DCAP with the traditional handcrafted method, the luminal/stromal areas and choroidal vascularity index (CVI) values for 20 OCT images were measured by three graders and the DCAP separately. Intraclass correlation coefficients (ICCs) and limits of agreement were used for agreement analysis. Results: The mean ± SD pixel-wise distances from the predicted choroidal inner, outer boundary, and nulla to the ground truth were 1.40 ± 1.23, 5.40 ± 2.24, and 1.92 ± 1.13 pixels, respectively. The mean times required for choroidal structure analysis were 1.00, 438.00 ± 75.88, 393.25 ± 78.77, and 410.10 ± 56.03 seconds per image for the DCAP and three graders, respectively. Agreement between the automatic and manual area measurements was excellent (ICCs > 0.900) but poor for the CVI (0.627; 95% confidence interval, 0.279-0.832). Additionally, the DCAP demonstrated better intersession repeatability. Conclusions: The DCAP is faster than manual methods. Also, it was able to reduce the intra-/intergrader and intersession variations to a small extent. Translational Relevance: The DCAP could aid in choroidal structure assessment.

Assuntos

Aprendizado Profundo , Miopia , Humanos , Criança , Corioide/diagnóstico por imagem , Miopia/diagnóstico por imagem , Tomografia de Coerência Óptica/métodos

Screening Referable Diabetic Retinopathy Using a Semi-automated Deep Learning Algorithm Assisted Approach.

Wang, Yueye; Shi, Danli; Tan, Zachary; Niu, Yong; Jiang, Yu; Xiong, Ruilin; Peng, Guankai; He, Mingguang.

Front Med (Lausanne) ; 8: 740987, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34901058

RESUMO

Purpose: To assess the accuracy and efficacy of a semi-automated deep learning algorithm (DLA) assisted approach to detect vision-threatening diabetic retinopathy (DR). Methods: We developed a two-step semi-automated DLA-assisted approach to grade fundus photographs for vision-threatening referable DR. Study images were obtained from the Lingtou Cohort Study, and captured at participant enrollment in 2009-2010 ("baseline images") and annual follow-up between 2011 and 2017. To begin, a validated DLA automatically graded baseline images for referable DR and classified them as positive, negative, or ungradable. Following, each positive image, all other available images from patients who had a positive image, and a 5% random sample of all negative images were selected and regraded by trained human graders. A reference standard diagnosis was assigned once all graders achieved consistent grading outcomes or with a senior ophthalmologist's final diagnosis. The semi-automated DLA assisted approach combined initial DLA screening and subsequent human grading for images identified as high-risk. This approach was further validated within the follow-up image datasets and its time and economic costs evaluated against fully human grading. Results: For evaluation of baseline images, a total of 33,115 images were included and automatically graded by the DLA. 2,604 images (480 positive results, 624 available other images from participants with a positive result, and 1500 random negative samples) were selected and regraded by graders. The DLA achieved an area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy of 0.953, 0.970, 0.879, and 88.6%, respectively. In further validation within the follow-up image datasets, a total of 88,363 images were graded using this semi-automated approach and human grading was performed on 8975 selected images. The DLA achieved an AUC, sensitivity, and specificity of 0.914, 0.852, 0.853, respectively. Compared against fully human grading, the semi-automated DLA-assisted approach achieved an estimated 75.6% time and 90.1% economic cost saving. Conclusions: The DLA described in this study was able to achieve high accuracy, sensitivity, and specificity in grading fundus images for referable DR. Validated against long-term follow-up datasets, a semi-automated DLA-assisted approach was able to accurately identify suspect cases, and minimize misdiagnosis whilst balancing safety, time, and economic cost.

A self-adaptive deep learning method for automated eye laterality detection based on color fundus photography.

Liu, Chi; Han, Xiaotong; Li, Zhixi; Ha, Jason; Peng, Guankai; Meng, Wei; He, Mingguang.

PLoS One ; 14(9): e0222025, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31536537

RESUMO

PURPOSE: To provide a self-adaptive deep learning (DL) method to automatically detect the eye laterality based on fundus images. METHODS: A total of 18394 fundus images with real-world eye laterality labels were used for model development and internal validation. A separate dataset of 2000 fundus images with eye laterality labeled manually was used for external validation. A DL model was developed based on a fine-tuned Inception-V3 network with self-adaptive strategy. The area under receiver operator characteristic curve (AUC) with sensitivity and specificity and confusion matrix were applied to assess the model performance. The class activation map (CAM) was used for model visualization. RESULTS: In the external validation (N = 2000, 50% labeled as left eye), the AUC of the DL model for overall eye laterality detection was 0.995 (95% CI, 0.993-0.997) with an accuracy of 99.13%. Specifically for left eye detection, the sensitivity was 99.00% (95% CI, 98.11%-99.49%) and the specificity was 99.10% (95% CI, 98.23%-99.56%). Nineteen images were wrongly classified as compared to the human labels: 12 were due to human wrong labelling, while 7 were due to poor image quality. The CAM showed that the region of interest for eye laterality detection was mainly the optic disc and surrounding areas. CONCLUSION: We proposed a self-adaptive DL method with a high performance in detecting eye laterality based on fundus images. Results of our findings were based on real world labels and thus had practical significance in clinical settings.

Assuntos

Olho/diagnóstico por imagem , Lateralidade Funcional , Fotografação/métodos , Idoso , Algoritmos , Área Sob a Curva , Aprendizado Profundo , Técnicas de Diagnóstico Oftalmológico , Fundo de Olho , Humanos , Pessoa de Meia-Idade , Fenômenos Fisiológicos Oculares , Sensibilidade e Especificidade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA