Search | VHL Regional Portal

Attention-Based Bi-Prediction Network for Versatile Video Coding (VVC) over 5G Network.

Choi, Young-Ju; Lee, Young-Woon; Kim, Jongho; Jeong, Se Yoon; Choi, Jin Soo; Kim, Byung-Gyu.

Sensors (Basel) ; 23(5)2023 Feb 27.

Article in English | MEDLINE | ID: mdl-36904838

ABSTRACT

As the demands of various network-dependent services such as Internet of things (IoT) applications, autonomous driving, and augmented and virtual reality (AR/VR) increase, the fifthgeneration (5G) network is expected to become a key communication technology. The latest video coding standard, versatile video coding (VVC), can contribute to providing high-quality services by achieving superior compression performance. In video coding, inter bi-prediction serves to improve the coding efficiency significantly by producing a precise fused prediction block. Although block-wise methods, such as bi-prediction with CU-level weight (BCW), are applied in VVC, it is still difficult for the linear fusion-based strategy to represent diverse pixel variations inside a block. In addition, a pixel-wise method called bi-directional optical flow (BDOF) has been proposed to refine bi-prediction block. However, the non-linear optical flow equation in BDOF mode is applied under assumptions, so this method is still unable to accurately compensate various kinds of bi-prediction blocks. In this paper, we propose an attention-based bi-prediction network (ABPN) to substitute for the whole existing bi-prediction methods. The proposed ABPN is designed to learn efficient representations of the fused features by utilizing an attention mechanism. Furthermore, the knowledge distillation (KD)- based approach is employed to compress the size of the proposed network while keeping comparable output as the large model. The proposed ABPN is integrated into the VTM-11.0 NNVC-1.0 standard reference software. When compared with VTM anchor, it is verified that the BD-rate reduction of the lightweighted ABPN can be up to 5.89% and 4.91% on Y component under random access (RA) and low delay B (LDB), respectively.

A Subjective and Objective Study of Space-Time Subsampled Video Quality.

Lee, Dae Yeol; Paul, Somdyuti; Bampis, Christos G; Ko, Hyunsuk; Kim, Jongho; Jeong, Se Yoon; Homan, Blake; Bovik, Alan C.

IEEE Trans Image Process ; 31: 934-948, 2022.

Article in English | MEDLINE | ID: mdl-34965209

ABSTRACT

Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compare the performance of leading video quality models on the new database. The new ETRI-LIVE STSVQ database is being made freely available at (https://live.ece.utexas.edu/research/ETRI-LIVE_STSVQ/index.html).

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL