Multistage feature fusion knowledge distillation.

Li, Gang; Wang, Kun; Lv, Pengfei; He, Pan; Zhou, Zheng; Xu, Chuanyun

Li, Gang; Wang, Kun; Lv, Pengfei; He, Pan; Zhou, Zheng; Xu, Chuanyun.

Affiliation

Li G; School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 401135, China.
Wang K; School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 401135, China.
Lv P; School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 401135, China.
He P; College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China.
Zhou Z; School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 401135, China.
Xu C; College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China. xcy@cqnu.edu.cn.

Sci Rep ; 14(1): 13373, 2024 Jun 11.

Article in En | MEDLINE | ID: mdl-38862547

ABSTRACT

ABSTRACT

Generally, the recognition performance of lightweight models is often lower than that of large models. Knowledge distillation, by teaching a student model using a teacher model, can further enhance the recognition accuracy of lightweight models. In this paper, we approach knowledge distillation from the perspective of intermediate feature-level knowledge distillation. We combine a cross-stage feature fusion symmetric framework, an attention mechanism to enhance the fused features, and a contrastive loss function for teacher and student models at the same stage to comprehensively implement a multistage feature fusion knowledge distillation method. This approach addresses the problem of significant differences in the intermediate feature distributions between teacher and student models, making it difficult to effectively learn implicit knowledge and thus improving the recognition accuracy of the student model. Compared to existing knowledge distillation methods, our method performs at a superior level. On the CIFAR100 dataset, it boosts the recognition accuracy of ResNet20 from 69.06% to 71.34%, and on the TinyImagenet dataset, it increases the recognition accuracy of ResNet18 from 66.54% to 68.03%, demonstrating the effectiveness and generalizability of our approach. Furthermore, there is room for further optimization of the overall distillation structure and feature extraction methods in this approach, which requires further research and exploration.

Key words

Attention mechanism; Feature fusion; Knowledge distillation; Label classification; Multistage

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: Sci Rep / Sci. rep. (Nat. Publ. Group) / Scientific reports (Nature Publishing Group) Year: 2024 Document type: Article Affiliation country: China Country of publication: Reino Unido

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google