RESUMO
Interstitial fibrosis assessment by renal pathologists lacks good agreement, and we aimed to investigate its hidden properties and infer possible clinical impact. Fifty kidney biopsies were assessed by 9 renal pathologists and evaluated by intraclass correlation coefficients (ICCs) and kappa statistics. Probabilities of pathologists' assessments that would deviate far from true values were derived from quadratic regression and multilayer perceptron nonlinear regression. Likely causes of variation in interstitial fibrosis assessment were investigated. Possible misclassification rates were inferred on reported large cohorts. We found inter-rater reliabilities ranged from poor to good (ICCs 0.48 to 0.90), and pathologists' assessments had the worst agreements when the extent of interstitial fibrosis was moderate. 33.5% of pathologists' assessments were expected to deviate far from the true values. Variation in interstitial fibrosis assessment was found to be correlated with variation in interstitial inflammation assessment (r2 = 32.1%). Taking IgA nephropathy as an example, the Oxford T scores for interstitial fibrosis were expected to be misclassified in 21.9% of patients. This study demonstrated the complexity of the inter-rater reliability of interstitial fibrosis assessment, and our proposed approaches discovered previously unknown properties in pathologists' practice and inferred a possible clinical impact on patients.
Assuntos
Glomerulonefrite por IGA , Rim , Humanos , Reprodutibilidade dos Testes , Rim/patologia , Glomerulonefrite por IGA/patologia , Fibrose , Variações Dependentes do ObservadorRESUMO
BACKGROUND: The extent of interstitial fibrosis in the kidney not only correlates with renal function at the time of biopsy but also predicts future renal outcome. However, its assessment by pathologists lacks good agreement. The aim of this study is to construct a machine learning-based model that enables automatic and reliable assessment of interstitial fibrosis in human kidney biopsies. METHODS: Validated cortex, glomerulus and tubule segmentation algorithms were incorporated into a single model to assess the extent of interstitial fibrosis. The model performances were compared with expert renal pathologists and correlated with patients' renal functional data. RESULTS: Compared with human raters, the model had the best agreement [intraclass correlation coefficient (ICC) 0.90] to the reference in 50 test cases. The model also had a low mean bias and the narrowest 95% limits of agreement. The model was robust against colour variation on images obtained at different times, through different scanners, or from outside institutions with excellent ICCs of 0.92-0.97. The model showed significantly better test-retest reliability (ICC 0.98) than humans (ICC 0.76-0.94) and the amount of interstitial fibrosis inferred by the model strongly correlated with 405 patients' serum creatinine (r = 0.65-0.67) and estimated glomerular filtration rate (r = -0.74 to -0.76). CONCLUSIONS: This study demonstrated that a trained machine learning-based model can faithfully simulate the whole process of interstitial fibrosis assessment, which traditionally can only be carried out by renal pathologists. Our data suggested that such a model may provide more reliable results, thus enabling precision medicine.