RESUMO
PURPOSE: The aim of this study was to validate the potential of substituting an observer in a paired comparison with a deep-learning observer. METHODS: Phantom images were obtained using computed tomography. Imaging conditions included a standard setting of 120 kVp and 200 mA, with tube current variations ranging from 160 mA, 120 mA, 80 mA, 40 mA, and 20 mA, resulting in six different imaging conditions. Fourteen radiologic technologists with >10 years of experience conducted pairwise comparisons using Ura's method. After training, VGG16 and VGG19 models were combined to form deep learning models, which were then evaluated for accuracy, recall, precision, specificity, and F1value. The validation results were used as the standard, and the results of the average degree of preference and significance tests between images were compared to the standard if the results of deep learning were incorporated. RESULTS: The average accuracy of the deep learning model was 82%, with a maximum difference of 0.13 from the standard regarding the average degree of preference, a minimum difference of 0, and an average difference of 0.05. Significant differences were observed in the test results when replacing human observers with AI counterparts for image pairs with tube currents of 160 mA vs. 120 mA and 200 mA vs. 160 mA. CONCLUSION: In paired comparisons with a limited phantom (7-point noise evaluation), the potential use of deep learning was suggested as one of the observers.