PET: Parameter-efficient Knowledge Distillation on Transformer.

Jeon, Hyojin; Park, Seungcheol; Kim, Jin-Gee; Kang, U

Jeon, Hyojin; Park, Seungcheol; Kim, Jin-Gee; Kang, U.

Afiliação

Jeon H; Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
Park S; Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
Kim JG; Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.
Kang U; Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.

PLoS One ; 18(7): e0288060, 2023.

Article em En | MEDLINE | ID: mdl-37410716

ABSTRACT

ABSTRACT

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT'14 ENâDE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.

Assuntos

Compressão de Dados; Destilação; Fontes de Energia Elétrica; Conhecimento

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Compressão de Dados Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google