RESUMO
Osteoporosis is a bone-related disease characterized by decreased bone density and mass, leading to brittle fractures. Osteoporosis assessment from radiographs using a deep learning algorithm has proven a low-cost alternative to the golden standard DXA. Due to the considerable noise and low contrast, automated diagnosis of osteoporosis in X-ray images still poses a significant challenge for traditional diagnostic methods. In this paper, an end-to-end transformer-style network was proposed, termed FCoTNet, to overcome the shortcoming of insufficient fusion of texture information and local features in the traditional CoTNet. To extract complementary geometric representations at each scale of the transformer module, we integrated parallel multi-scale feature extraction architectures in each unit layer of FCoTNet to utilize convolution to aggregate features from different receptive fields. Moreover, in order to extract small-scale texture features which were more critical to the diagnosis of osteoporosis in radiographs, larger fusion weights were assigned to the feature maps with small-size receptive fields. Afterward, the multi-scale global modeling was conducted by self-attention mechanism. The proposed model was first investigated on a private lumbar spine X-ray dataset with the 5-fold cross-validation strategy, obtaining an average accuracy of 78.29 ± 0.93 %, an average sensitivity of 69.72 ± 2.35 %, and an average specificity of 88.92 ± 0.67 % for the multi-classification of normal, osteopenia, and osteoporosis categories. We then conducted a controlled trial with five orthopedic clinicians to evaluate the clinical value of the model. The average clinician's accuracy improved from 61.50 ± 10.79 % unaided to 80.00 ± 5.92 % aided (18.50 % improvement), sensitivity improved from 64.38 ± 8.07 % unaided to 83.31 ± 5.43 % aided (18.93 % improvement), and specificity improved from 80.11 ± 4.72 % unaided to 89.94 ± 3.82 % aided (9.83 % improvement). Meanwhile, the prediction consistency among clinicians significantly improved with the assistance of FCoTNet. Furthermore, the proposed model showed good robustness on an external test dataset. These investigations indicate that the proposed deep learning model achieves state-of-the-art performance for osteoporosis prediction, which substantially improves osteoporosis screening and reduced osteoporosis fractures.