RESUMO
INTRODUCTION: Grading of hydronephrosis severity on postnatal renal ultrasound guides management decisions in antenatal hydronephrosis (ANH). Multiple systems exist to help standardize hydronephrosis grading, yet poor inter-observer reliability persists. Machine learning methods may provide tools to improve the efficiency and accuracy of hydronephrosis grading. OBJECTIVE: To develop an automated convolutional neural network (CNN) model to classify hydronephrosis on renal ultrasound imaging according to the Society of Fetal Urology (SFU) system as potential clinical adjunct. STUDY DESIGN: A cross-sectional, single-institution cohort of postnatal renal ultrasounds with radiologist SFU grading from pediatric patients with and without hydronephrosis of stable severity was obtained. Imaging labels were used to automatedly select sagittal and transverse grey-scale renal images from all available studies from each patient. A VGG16 pre-trained ImageNet CNN model analyzed these preprocessed images. Three-fold stratified cross-validation was used to build and evaluate the model that was used to classify renal ultrasounds on a per patient basis into five classes based on the SFU system (normal, SFU I, SFU II, SFU III, or SFU IV). These predictions were compared to radiologist grading. Confusion matrices evaluated model performance. Gradient class activation mapping demonstrated imaging features driving model predictions. RESULTS: We identified 710 patients with 4659 postnatal renal ultrasound series. Per radiologist grading, 183 were normal, 157 were SFU I, 132 were SFU II, 100 were SFU III, and 138 were SFU IV. The machine learning model predicted hydronephrosis grade with 82.0% (95% CI: 75-83%) overall accuracy and classified 97.6% (95% CI: 95-98%) of the patients correctly or within one grade of the radiologist grade. The model classified 92.3% (95% CI: 86-95%) normal, 73.2% (95% CI: 69-76%) SFU I, 73.5% (95% CI: 67-75%) SFU II, 79.0% (95% CI: 73-82%) SFU III, and 88.4% (95% CI: 85-92%) SFU IV patients accurately. Gradient class activation mapping demonstrated that the ultrasound appearance of the renal collecting system drove the model's predictions. DISCUSSION: The CNN-based model classified hydronephrosis on renal ultrasounds automatically and accurately based on the expected imaging features in the SFU system. Compared to prior studies, the model functioned more automatically with greater accuracy. Limitations include the retrospective, relatively small cohort, and averaging across multiple imaging studies per patient. CONCLUSIONS: An automated CNN-based system classified hydronephrosis on renal ultrasounds according to the SFU system with promising accuracy based on appropriate imaging features. These findings suggest a possible adjunctive role for machine learning systems in the grading of ANH.