RESUMO
The emergence of the new video coding standard, Versatile Video Coding (VVC), has resulted in a 40-50% coding gain over its predecessor HEVC for the same visual quality. However, this is accompanied by a sharp increase in computational complexity. The emergence of the VVC standard and the increase in video resolution have exceeded the capacity of single-core architectures. This fact has led researchers to use multicore architectures for the implementation of video standards and to use the parallelism of these architectures for real-time applications. With the strong growth in both areas, video coding and multicore architecture, there is a great need for a design methodology that facilitates the exploration of heterogeneous multicore architectures, which automatically generates optimized code for these architectures in order to reduce time to market. In this context, this paper aims to use the methodology based on data flow modeling associated with the PREESM software. This paper shows how the software has been used to model a complete standard VVC video decoder using Parameterized and Interfaced Synchronous Dataflow (PiSDF) model. The proposed model takes advantage of the parallelism strategies of the OpenVVC decoder and in particular the tile-based parallelism. Experimental results show that the speed of the VVC decoder in PiSDF is slightly higher than the OpenVVC decoder handwritten in C/C++ languages, by up to 11% speedup on a 24-core processor. Thus, the proposed decoder outperforms the state-of-the-art dataflow decoders based on the RVC-CAL model.
RESUMO
The versatile video coding standard H.266/VVC release has been accompanied with various new contributions to improve the coding efficiency beyond the high-efficiency video coding (HEVC), particularly in the transformation process. The adaptive multiple transform (AMT) is one of the new tools that was introduced in the transform module. It involves five transform types from the discrete cosine transform/discrete sine transform families with larger block sizes. The DCT-II has a fast computing algorithm, while the DST-VII relies on a complex matrix multiplication. This has led to an additional computational complexity. The approximation of the DST-VII can be used for the transform optimization. At the hardware level, this method can provide a gain in power consumption, logic resources use and speed. In this paper, a unifed two-dimensional transform architecture that enables exact and approximate DST-VII computation of sizes 8 × 8 , 8 × 16 , 8 × 32 , 16 × 8 , 16 × 16 , 16 × 32 , 32 × 8 , 32 × 16 and 32 × 32 is proposed. The exact transform computation can be processed using either multipliers or the MCM algorithm, while the approximate transform computation is based on additions and bit-shifting operations. All the designs are implemented under the Arria 10 FPGA device. The synthesis results show that the proposed design implementing the approximate transform matrices is the most efficient method with only 4% of area consumption. It reduces the logic utilization by more than 65% compared to the multipliers-based exact transform design, while about 53% of hardware cost saving is obtained when compared to the MCM-based computation. Furthermore, the approximate-based 2D transform architecture can operate at 78 MHz allowing a real-time coding for 2K and 4K videos at 100 and 25 frames/s, respectively.