NRVC: Neural Representation for Video Compression with Implicit Multiscale Fusion Network.

Liu, Shangdong; Cao, Puming; Feng, Yujian; Ji, Yimu; Chen, Jiayuan; Xie, Xuedong; Wu, Longji

Liu, Shangdong; Cao, Puming; Feng, Yujian; Ji, Yimu; Chen, Jiayuan; Xie, Xuedong; Wu, Longji.

Afiliación

Liu S; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Cao P; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Feng Y; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Ji Y; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Chen J; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Xie X; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.
Wu L; School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China.

Entropy (Basel) ; 25(8)2023 Aug 04.

Article en En | MEDLINE | ID: mdl-37628197

ABSTRACT

ABSTRACT

Recently, end-to-end deep models for video compression have made steady advancements. However, this resulted in a lengthy and complex pipeline containing numerous redundant parameters. The video compression approaches based on implicit neural representation (INR) allow videos to be directly represented as a function approximated by a neural network, resulting in a more lightweight model, whereas the singularity of the feature extraction pipeline limits the network's ability to fit the mapping function for video frames. Hence, we propose a neural representation approach for video compression with an implicit multiscale fusion network (NRVC), utilizing normalized residual networks to improve the effectiveness of INR in fitting the target function. We propose the multiscale representations for video compression (MSRVC) network, which effectively extracts features from the input video sequence to enhance the degree of overfitting in the mapping function. Additionally, we propose the feature extraction channel attention (FECA) block to capture interaction information between different feature extraction channels, further improving the effectiveness of feature extraction. The results show that compared to the NeRV method with similar bits per pixel (BPP), NRVC has a 2.16% increase in the decoded peak signal-to-noise ratio (PSNR). Moreover, NRVC outperforms the conventional HEVC in terms of PSNR.

Palabras clave

attention mechanism; implicit neural representation; video compression

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Entropy (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google