Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs.

Tian, Yingjie; Xu, Shaokai; Li, Muyang

Tian, Yingjie; Xu, Shaokai; Li, Muyang.

Afiliación

Tian Y; School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, 100190, China. Electronic address: tyj@ucas.ac.cn.
Xu S; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, 100190, China. Electronic address: xushaokai23@mails.ucas.ac.cn.
Li M; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences, Beijing, 100190, China. Electronic address: limuyang23@mails.ucas.ac.cn.

Neural Netw ; 179: 106567, 2024 Nov.

Article en En | MEDLINE | ID: mdl-39089155

ABSTRACT

ABSTRACT

While Graph Neural Networks (GNNs) have demonstrated their effectiveness in processing non-Euclidean structured data, the neighborhood fetching of GNNs is time-consuming and computationally intensive, making them difficult to deploy in low-latency industrial applications. To address the issue, a feasible solution is graph knowledge distillation (KD), which can learn high-performance student Multi-layer Perceptrons (MLPs) to replace GNNs by mimicking the superior output of teacher GNNs. However, state-of-the-art graph knowledge distillation methods are mainly based on distilling deep features from intermediate hidden layers, this leads to the significance of logit layer distillation being greatly overlooked. To provide a novel viewpoint for studying logits-based KD methods, we introduce the idea of decoupling into graph knowledge distillation. Specifically, we first reformulate the classical graph knowledge distillation loss into two parts, i.e., the target class graph distillation (TCGD) loss and the non-target class graph distillation (NCGD) loss. Next, we decouple the negative correlation between GNN's prediction confidence and NCGD loss, as well as eliminate the fixed weight between TCGD and NCGD. We named this logits-based method Decoupled Graph Knowledge Distillation (DGKD). It can flexibly adjust the weights of TCGD and NCGD for different data samples, thereby improving the prediction accuracy of the student MLP. Extensive experiments conducted on public benchmark datasets show the effectiveness of our method. Additionally, DGKD can be incorporated into any existing graph knowledge distillation framework as a plug-and-play loss function, further improving distillation performance. The code is available at https//github.com/xsk160/DGKD.

Asunto(s)

Redes Neurales de la Computación; Aprendizaje Automático; Algoritmos; Conocimiento; Modelos Logísticos

Palabras clave

Decoupling; Graph knowledge distillation; Graph neural networks; Multi-layer perceptrons

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Redes Neurales de la Computación Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google