RESUMO
This paper presents a dataset comprising 700 video sequences encoded in the two most popular video formats (codecs) of today, H.264 and H.265 (HEVC). Six reference sequences were encoded under different quality profiles, including several bitrates and resolutions, and were affected by various packet loss rates. Subsequently, the image quality of encoded video sequences was assessed by subjective, as well as objective, evaluation. Therefore, the enclosed spreadsheet contains results of both assessment approaches in a form of MOS (Mean Opinion Score) delivered by the absolute category ranking (ACR) procedure, SSIM (Structural Similarity Index Measure) and VMAF (Video Multimethod Assessment Fusion). All assessments are available for each test sequence. This allows a comprehensive evaluation of coding efficiency under different test scenarios without the necessity of real observers or a secure laboratory environment, as recommended by the ITU (International Telecommunication Union). As there is currently no standardized mapping function between the results of subjective and objective methods, this dataset can also be used to design and verify experimental machine learning algorithms that contribute to solving the relevant research issues.
RESUMO
High-efficiency video coding (HEVC/H.265) is one of the most widely used video coding standards. HEVC introduces a quad-tree coding unit (CU) partition structure to improve video compression efficiency. The determination of the optimal CU partition is achieved through the brute-force search rate-distortion optimization method, which may result in high encoding complexity and hardware implementation challenges. To address this problem, this paper proposes a method that combines convolutional neural networks (CNN) with joint texture recognition to reduce encoding complexity. First, a classification decision method based on the global and local texture features of the CU is proposed, efficiently dividing the CU into smooth and complex texture regions. Second, for the CUs in smooth texture regions, the partition is determined by terminating early. For the CUs in complex texture regions, a proposed CNN is used for predictive partitioning, thus avoiding the traditional recursive approach. Finally, combined with texture classification, the proposed CNN achieves a good balance between the coding complexity and the coding performance. The experimental results demonstrate that the proposed algorithm reduces computational complexity by 61.23%, while only increasing BD-BR by 1.86% and decreasing BD-PSNR by just 0.09 dB.
RESUMO
This article describes an empirical exploration on the effect of information loss affecting compressed representations of dynamic point clouds on the subjective quality of the reconstructed point clouds. The study involved compressing a set of test dynamic point clouds using the MPEG V-PCC (Video-based Point Cloud Compression) codec at 5 different levels of compression and applying simulated packet losses with three packet loss rates (0.5%, 1% and 2%) to the V-PCC sub-bitstreams prior to decoding and reconstructing the dynamic point clouds. The recovered dynamic point clouds qualities were then assessed by human observers in experiments conducted at two research laboratories in Croatia and Portugal, to collect MOS (Mean Opinion Score) values. These scores were subject to a set of statistical analyses to measure the degree of correlation of the data from the two laboratories, as well as the degree of correlation between the MOS values and a selection of objective quality measures, while taking into account compression level and packet loss rates. The subjective quality measures considered, all of the full-reference type, included point cloud specific measures, as well as others adapted from image and video quality measures. In the case of image-based quality measures, FSIM (Feature Similarity index), MSE (Mean Squared Error), and SSIM (Structural Similarity index) yielded the highest correlation with subjective scores in both laboratories, while PCQM (Point Cloud Quality Metric) showed the highest correlation among all point cloud-specific objective measures. The study showed that even 0.5% packet loss rates reduce the decoded point clouds subjective quality by more than 1 to 1.5 MOS scale units, pointing out the need to adequately protect the bitstreams against losses. The results also showed that the degradations in V-PCC occupancy and geometry sub-bitstreams have significantly higher (negative) impact on decoded point cloud subjective quality than degradations of the attribute sub-bitstream.
Assuntos
Compressão de Dados , Humanos , Compressão de Dados/métodos , Croácia , PortugalRESUMO
Video delivered over IP networks in real-time applications, which utilize RTP protocol over unreliable UDP such as videotelephony or live-streaming, is often prone to degradation caused by multiple sources. The most significant is the combined effect of video compression and its transmission over the communication channel. This paper analyzes the adverse impact of packet loss on video quality encoded with various combinations of compression parameters and resolutions. For the purposes of the research, a dataset containing 11,200 full HD and ultra HD video sequences encoded to H.264 and H.265 formats at five bit rates was compiled with a simulated packet loss rate (PLR) ranging from 0 to 1%. Objective assessment was conducted by using peak signal to noise ratio (PSNR) and Structural Similarity Index (SSIM) metrics, whereas the well-known absolute category rating (ACR) was used for subjective evaluation. Analysis of the results confirmed the presumption that video quality decreases along with the rise of packet loss rate, regardless of compression parameters. The experiments further led to a finding that the quality of sequences affected by PLR declines with increasing bit rate. Additionally, the paper includes recommendations of compression parameters for use under various network conditions.
RESUMO
This article presents a method for transparent watermarking of high-capacity watermarked video under H.265/HEVC (High-Efficiency Video Coding) compression conditions while maintaining high-quality encoded image. The aim of this paper is to present a method for watermark embedding using neural networks under conditions of subjecting video to lossy compression of the HEVC codec using the YUV420p color model chrominance channel for watermarking. This paper presents a method for training a deep neural network to embed a watermark when a compression channel is present. The discussed method is characterized by high accuracy of the video with an embedded watermark compared to the original. The PSNR (peak signal-to-noise ratio) values obtained are over 44 dB. The watermark capacity is 96 bits for an image with a resolution of 128 × 128. The method enables the complete recovery of a watermark from a single video frame compressed by the HEVC codec within the range of compression values defined by the CRF (constant rate factor) up to 22.
Assuntos
Compressão de Dados , Interpretação de Imagem Assistida por Computador , Compressão de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Fenômenos Físicos , Razão Sinal-RuídoRESUMO
This paper presents a method of high-capacity and transparent watermarking based on the usage of deep neural networks with the adjustable subsquares properties algorithm to encode the data of a watermark in high-quality video using the H.265/HEVC (High-Efficiency Video Coding) codec. The aim of the article is to present a method of embedding a watermark in a video with HEVC codec compression by making changes in a video in a way that is not noticeable to the naked eye. The method presented here is characterised by focusing on ensuring the accuracy of the original image in relation to the watermarked image, providing the transparency of the embedded watermark, while ensuring its survival after compression by the HEVC codec. The article includes a presentation of the practical results of watermark embedding with a built-in variation mechanism of its capacity and resistance, thanks to the adjustable subsquares properties algorithm. The obtained PSNR (peak signal-to-noise ratio) results are at the level of 40 dB or better. There is the possibility of the complete recovery of a watermark from a single frame compressed in the CRF (constant rate factor) range of up to 16, resulting in a BER (bit error rate) equal to 0 for the received watermark.
Assuntos
Redes Neurais de Computação , Gravação em Vídeo , Algoritmos , Compressão de DadosRESUMO
In this audio/video authenticity research project, 44 original MOV files were produced on an Apple iPhone 12 Pro Max mobile device, running the iOS 14.2.1 operating system, in all available video formats and at four different nominal recording lengths. Each of the original files was then trimmed, using the Apple Photos app, in three different ways: deleting a portion of the beginning, a portion of the end, and portions of both the beginning and end. These 176 original and trimmed files were transferred to a laboratory computer and the footer and other metadata were analyzed with a hex editor. This analysis revealed that the trimmed recordings could be differentiated from the originals; that the iPhone model and the iOS operating system version could be identified; that important recording dates and times can be determined; and that the type of trimming, in some cases, could be determined.
RESUMO
Finding a proper balance between video quality and the required bandwidth is an important issue, especially in networks of limited capacity. The problem of comparing the efficiency of video codecs and choosing the most suitable one in a specific situation has become very important. This paper proposes a method of comparing video codecs while also taking into account objective quality assessment metrics. The author shows the process of preparing video footage, assessing its quality, determining the rate-distortion curves, and calculating the bitrate saving for pairs of examined codecs. Thanks to the use of the spline interpolation method, the obtained results are better than those previously presented in the literature, and more resistant to the quality metric used.
Assuntos
Gravação em VídeoRESUMO
This article investigates the performance of various sophisticated channel coding and transmission schemes for achieving reliable transmission of a highly compressed video stream. Novel error protection schemes including Non-Convergent Coding (NCC) scheme, Non-Convergent Coding assisted with Differential Space Time Spreading (DSTS) and Sphere Packing (SP) modulation (NCDSTS-SP) scheme and Convergent Coding assisted with DSTS and SP modulation (CDSTS-SP) are analyzed using Bit Error Ratio (BER) and Peak Signal to Noise Ratio (PSNR) performance metrics. Furthermore, error reduction is achieved using sophisticated transceiver comprising SP modulation technique assisted by Differential Space Time Spreading. The performance of the iterative Soft Bit Source Decoding (SBSD) in combination with channel codes is analyzed using various error protection setups by allocating consistent overall bit-rate budget. Additionally, the iterative behavior of SBSD assisted RSC decoder is analyzed with the aid of Extrinsic Information Transfer (EXIT) Chart in order to analyze the achievable turbo cliff of the iterative decoding process. The subjective and objective video quality performance of the proposed error protection schemes is analyzed while employing H.264 advanced video coding and H.265 high efficient video coding standards, while utilizing diverse video sequences having different resolution, motion and dynamism. It was observed that in the presence of noisy channel the low resolution videos outperforms its high resolution counterparts. Furthermore, it was observed that the performance of video sequence with low motion contents and dynamism outperforms relative to video sequence with high motion contents and dynamism. More specifically, it is observed that while utilizing H.265 video coding standard, the Non-Convergent Coding assisted with DSTS and SP modulation scheme with enhanced transmission mechanism results in Eb/N0 gain of 20 dB with reference to the Non-Convergent Coding and transmission mechanism at the objective PSNR value of 42 dB. It is important to mention that both the schemes have employed identical code rate. Furthermore, the Convergent Coding assisted with DSTS and SP modulation mechanism achieved superior performance with reference to the equivalent rate Non-Convergent Coding assisted with DSTS and SP modulation counterpart mechanism, with a performance gain of 16 dB at the objective PSNR grade of 42 dB. Moreover, it is observed that the maximum achievable PSNR gain through H.265 video coding standard is 45 dB, with a PSNR gain of 3 dB with reference to the identical code rate H.264 coding scheme.
RESUMO
Video quality evaluation needs a combined approach that includes subjective and objective metrics, testing, and monitoring of the network. This paper deals with the novel approach of mapping quality of service (QoS) to quality of experience (QoE) using QoE metrics to determine user satisfaction limits, and applying QoS tools to provide the minimum QoE expected by users. Our aim was to connect objective estimations of video quality with the subjective estimations. A comprehensive tool for the estimation of the subjective evaluation is proposed. This new idea is based on the evaluation and marking of video sequences using the sentinel flag derived from spatial information (SI) and temporal information (TI) in individual video frames. The authors of this paper created a video database for quality evaluation, and derived SI and TI from each video sequence for classifying the scenes. Video scenes from the database were evaluated by objective and subjective assessment. Based on the results, a new model for prediction of subjective quality is defined and presented in this paper. This quality is predicted using an artificial neural network based on the objective evaluation and the type of video sequences defined by qualitative parameters such as resolution, compression standard, and bitstream. Furthermore, the authors created an optimum mapping function to define the threshold for the variable bitrate setting based on the flag in the video, determining the type of scene in the proposed model. This function allows one to allocate a bitrate dynamically for a particular segment of the scene and maintains the desired quality. Our proposed model can help video service providers with the increasing the comfort of the end users. The variable bitstream ensures consistent video quality and customer satisfaction, while network resources are used effectively. The proposed model can also predict the appropriate bitrate based on the required quality of video sequences, defined using either objective or subjective assessment.
RESUMO
This paper deals with the impact of content on the perceived video quality evaluated using the subjective Absolute Category Rating (ACR) method. The assessment was conducted on eight types of video sequences with diverse content obtained from the SJTU dataset. The sequences were encoded at 5 different constant bitrates in two widely video compression standards H.264/AVC and H.265/HEVC at Full HD and Ultra HD resolutions, which means 160 annotated video sequences were created. The length of Group of Pictures (GOP) was set to half the framerate value, as is typical for video intended for transmission over a noisy communication channel. The evaluation was performed in two laboratories: one situated at the University of Zilina, and the second at the VSB-Technical University in Ostrava. The results acquired in both laboratories reached/showed a high correlation. Notwithstanding the fact that the sequences with low Spatial Information (SI) and Temporal Information (TI) values reached better Mean Opinion Score (MOS) score than the sequences with higher SI and TI values, these two parameters are not sufficient for scene description, and this domain should be the subject of further research. The evaluation results led us to the conclusion that it is unnecessary to use the H.265/HEVC codec for compression of Full HD sequences and the compression efficiency of the H.265 codec by the Ultra HD resolution reaches the compression efficiency of both codecs by the Full HD resolution. This paper also includes the recommendations for minimum bitrate thresholds at which the video sequences at both resolutions retain good and fair subjectively perceived quality.
RESUMO
Video streaming has become a kind of main information carried by Unmanned Aerial Vehicles (UAVs). Unlike single transmission, when a cluster of UAVs execute the real-time video shooting and uploading mission, the insufficiency of wireless channel resources will lead to bandwidth competition among them and the competition will bring bad watching experience to the audience. Therefore, how to allocate uplink bandwidth reasonably in the cluster has become a crucial problem. In this paper, an intelligent and distributed allocation mechanism is designed for improving users' video viewing satisfication. Each UAV in a cluster can independently adjust and select its video encoding rate so as to achieve flexible uplink allocation. This choice relies neither on the existence of the central node, nor on the large amount of information interaction between UAVs. Firstly, in order to distinguish video service from ordinary data, a utility function for the overall Quality of Experience (QoE) is proposed. Then, a potential game model is built around the problem. By a distributed self-learning algorithm with low complexity, all UAVs can iteratively update their own bandwidth strategy in a short time until equilibria, thus achieving the total quality optimization of all videos. Numeric simulation results indicate, after a few iterations, that the algorithm converges to a set of correlation equilibria. This mechanism not only solves the uplink allocation problem of video streaming in UAV cluster, but also guarantees the wireless resource providers in distinguishing and ensuring network service quality.
RESUMO
Digital screening and diagnosis from cytology slides can be aided by capturing multiple focal planes. However, using conventional methods, the large file sizes of high-resolution whole-slide images increase linearly with the number of focal planes acquired, leading to significant data storage and bandwidth requirements for the efficient storage and transfer of cytology virtual slides. We investigated whether a sequence of focal planes contained sufficient redundancy to efficiently compress virtual slides across focal planes by applying a commonly available video compression standard, high-efficiency video coding (HEVC). By developing an adaptive algorithm that applied compression to achieve a target image quality, we found that the compression ratio of HEVC exceeded that obtained using JPEG and JPEG2000 compression while maintaining a comparable level of image quality. These results suggest an alternative method for the efficient storage and transfer of whole-slide images that contain multiple focal planes, expanding the utility of this rapidly evolving imaging technology into cytology.
RESUMO
Perceptual video coding (PVC) can provide a lower bitrate with the same visual quality compared with traditional H.265/high efficiency video coding (HEVC). In this work, a novel H.265/HEVC-compliant PVC framework is proposed based on the video saliency model. Firstly, both an effective and efficient spatiotemporal saliency model is used to generate a video saliency map. Secondly, a perceptual coding scheme is developed based on the saliency map. A saliency-based quantization control algorithm is proposed to reduce the bitrate. Finally, the simulation results demonstrate that the proposed perceptual coding scheme shows its superiority in objective and subjective tests, achieving up to a 9.46% bitrate reduction with negligible subjective and objective quality loss. The advantage of the proposed method is the high quality adapted for a high-definition video application.