Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network.

Choi, Young-Ju; Lee, Young-Woon; Kim, Jongho; Jeong, Se Yoon; Choi, Jin Soo; Kim, Byung-Gyu

Choi, Young-Ju; Lee, Young-Woon; Kim, Jongho; Jeong, Se Yoon; Choi, Jin Soo; Kim, Byung-Gyu.

Afiliação

Choi YJ; Department of IT Engineering, Sookmyung Women's University, Seoul 04310, Republic of Korea.
Lee YW; Department of Computer Engineering, Sunmoon University, Asan 31460, Republic of Korea.
Kim J; Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea.
Jeong SY; Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea.
Choi JS; Media Coding Research Section, Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea.
Kim BG; Department of IT Engineering, Sookmyung Women's University, Seoul 04310, Republic of Korea.

Sensors (Basel) ; 23(5)2023 Feb 27.

Article em En | MEDLINE | ID: mdl-36904838

ABSTRACT

ABSTRACT

As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.

Palavras-chave

5G; attention mechanism; bi-prediction; convolutional neural network; versatile video coding

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2023 Tipo de documento: Article