Tolerant Self-Distillation for image classification.

Liu, Mushui; Yu, Yunlong; Ji, Zhong; Han, Jungong; Zhang, Zhongfei

Liu, Mushui; Yu, Yunlong; Ji, Zhong; Han, Jungong; Zhang, Zhongfei.

Afiliação

Liu M; College of Information Science and Electronic Engineering, Zhejiang University, China.
Yu Y; College of Information Science and Electronic Engineering, Zhejiang University, China. Electronic address: yuyunlong@zju.edu.cn.
Ji Z; School of Electrical and Information Engineering, Tianjin University, China.
Han J; Department of Computer Science, the University of Sheffield, UK.
Zhang Z; Computer Science Department, Watson School, State University of New York Binghamton University, USA.

Neural Netw ; 174: 106215, 2024 Jun.

Article em En | MEDLINE | ID: mdl-38471261

ABSTRACT

ABSTRACT

Deep neural networks tend to suffer from the overfitting issue when the training data are not enough. In this paper, we introduce two metrics from the intra-class distribution of correct-predicted and incorrect-predicted samples to provide a new perspective on the overfitting issue. Based on it, we propose a knowledge distillation approach without pretraining a teacher model in advance named Tolerant Self-Distillation (TSD) for alleviating the overfitting issue. It introduces an online updating memory and selectively stores the class predictions of the samples from the past iterations, making it possible to distill knowledge across the iterations. Specifically, the class predictions stored in the memory bank serve as the soft labels for supervising the samples from the same class for the current iteration in a reverse way, i.e. the correct-predicted samples are supervised with the incorrect predictions while the incorrect-predicted samples are supervised with the correct predictions. Consequently, the premature convergence issue caused by the over-confident samples would be mitigated, which helps the model to converge to a better local optimum. Extensive experimental results on several image classification benchmarks, including small-scale, large-scale, and fine-grained datasets, demonstrate the superiority of the proposed TSD.

Assuntos

Benchmarking; Conhecimento; Redes Neurais de Computação

Palavras-chave

Deep Learning; Overfitting; Self-Distillation; Tolerant

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Conhecimento / Benchmarking Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Conhecimento / Benchmarking Idioma: En Ano de publicação: 2024 Tipo de documento: Article