Search | Brasil - Virtual Health Library

1.

HDR or SDR? A Subjective and Objective Study of Scaled and Compressed Videos.

Ebenezer, Joshua P; Shang, Zaixi; Chen, Yixu; Wu, Yongjun; Wei, Hai; Sethuraman, Sriram; Bovik, Alan C.

IEEE Trans Image Process ; 33: 3606-3619, 2024.

Article in English | MEDLINE | ID: mdl-38814774

ABSTRACT

We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions, and among many other findings, observed that HDR videos were often rated as lower quality than SDR videos at lower bitrates, particularly when viewed on LCD and QLED displays. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses a contrast-based analysis of classical and bit-depth features to predict quality more accurately than existing metrics.

2.

A Study of Subjective and Objective Quality Assessment of HDR Videos.

Shang, Zaixi; Ebenezer, Joshua P; Venkataramanan, Abhinau K; Wu, Yongjun; Wei, Hai; Sethuraman, Sriram; Bovik, Alan C.

IEEE Trans Image Process ; 33: 42-57, 2024.

Article in English | MEDLINE | ID: mdl-37988212

ABSTRACT

As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content. Perceptual quality models are used to monitor and control the compression of streamed SDR content. A similar strategy should be useful for HDR content, yet there has been limited work on building HDR video quality assessment (VQA) algorithms. One reason for this is a scarcity of high-quality HDR VQA databases representative of contemporary HDR standards. Towards filling this gap, we created the first publicly available HDR VQA database dedicated to HDR10 videos, called the Laboratory for Image and Video Engineering (LIVE) HDR Database. It comprises 310 videos from 31 distinct source sequences processed by ten different compression and resolution combinations, simulating bitrate ladders used by the streaming industry. We used this data to conduct a subjective quality study, gathering more than 20,000 human quality judgments under two different illumination conditions. To demonstrate the usefulness of this new psychometric data resource, we also designed a new framework for creating HDR quality sensitive features, using a nonlinear transform to emphasize distortions occurring in spatial portions of videos that are enhanced by HDR, e.g., having darker blacks and brighter whites. We apply this new method, which we call HDRMAX, to modify the widely-deployed Video Multimethod Assessment Fusion (VMAF) model. We show that VMAF+HDRMAX provides significantly elevated performance on both HDR and SDR videos, exceeding prior state-of-the-art model performance. The database is now accessible at: https://live.ece.utexas.edu/research/LIVEHDR/LIVEHDR_index.html. The model will be made available at a later date at: https://live.ece.utexas.edu//research/Quality/index_algorithms.htm.

3.

Dual-Stream Complex-Valued Convolutional Network for Authentic Dehazed Image Quality Assessment.

Guan, Tuxin; Li, Chaofeng; Zheng, Yuhui; Wu, Xiaojun; Bovik, Alan C.

IEEE Trans Image Process ; 33: 466-478, 2024.

Article in English | MEDLINE | ID: mdl-38150345

ABSTRACT

Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued representations, achieving better generalization capability on perceptual feature learning than real-valued ones. To learn more discriminative features to analyze the perceptual quality of dehazed images, we design a dual-stream CV-CNN architecture. The dual-stream model comprises a distortion-sensitive stream that operates on the dehazed RGB image, and a haze-aware stream on a novel dark channel difference image. The distortion-sensitive stream accounts for perceptual distortion artifacts, while the haze-aware stream addresses the possible presence of residual haze. Experimental results on three publicly available dehazed image quality assessment (DQA) databases demonstrate the effectiveness and generalization of our proposed CV-CNN DQA model as compared to state-of-the-art no-reference image quality assessment algorithms.

4.

One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment.

Venkataramanan, Abhinau K; Stejerean, Cosmin; Katsavounidis, Ioannis; Bovik, Alan C.

IEEE Trans Image Process ; PP2023 Dec 27.

Article in English | MEDLINE | ID: mdl-38150347

ABSTRACT

The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines. Towards alleviating this burden, we propose a novel Fusion of Unified Quality Evaluators (FUNQUE) framework, by enabling computation sharing and by using a transform that is sensitive to visual perception to boost accuracy. Further, we expand the FUNQUE framework to define a collection of improved low-complexity fused-feature models that advance the state-of-the-art of video quality performance with respect to both accuracy, by 4.2% to 5.3%, and computational efficiency, by factors of 3.8 to 11 times!.

5.

CONVIQT: Contrastive Video Quality Estimator.

Madhusudana, Pavan C; Birkbeck, Neil; Wang, Yilin; Adsumilli, Balu; Bovik, Alan C.

IEEE Trans Image Process ; 32: 5138-5152, 2023.

Article in English | MEDLINE | ID: mdl-37676804

ABSTRACT

Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model against leading algorithms on multiple VQA databases containing wide ranges of spatial and temporal distortions. We analyze the correlations between model predictions and ground-truth quality ratings, and show that CONVIQT achieves competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.

6.

Machine Learning of Physiologic Waveforms and Electronic Health Record Data: A Large Perioperative Data Set of High-Fidelity Physiologic Waveforms.

Kim, Sungsoo; Kwon, Sohee; Rudas, Akos; Pal, Ravi; Markey, Mia K; Bovik, Alan C; Cannesson, Maxime.

Crit Care Clin ; 39(4): 675-687, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37704333

ABSTRACT

Perioperative morbidity and mortality are significantly associated with both static and dynamic perioperative factors. The studies investigating static perioperative factors have been reported; however, there are a limited number of previous studies and data sets analyzing dynamic perioperative factors, including physiologic waveforms, despite its clinical importance. To fill the gap, the authors introduce a novel large size perioperative data set: Machine Learning Of physiologic waveforms and electronic health Record Data (MLORD) data set. They also provide a concise tutorial on machine learning to illustrate predictive models trained on complex and diverse structures in the MLORD data set.

Subject(s)

Electronic Health Records , Machine Learning , Humans , Clinical Relevance

7.

Helping Visually Impaired People Take Better Quality Pictures.

Mandal, Maniratnam; Ghadiyaram, Deepti; Gurari, Danna; Bovik, Alan C.

IEEE Trans Image Process ; 32: 3873-3884, 2023.

Article in English | MEDLINE | ID: mdl-37432828

ABSTRACT

Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise. We do not address the complementary problems of semantic quality, leaving that aspect for future work. The problem of assessing, and providing actionable feedback on the technical quality of pictures captured by visually impaired users is hard enough, owing to the severe, commingled distortions that often occur. To advance progress on the problem of analyzing and measuring the technical quality of visually impaired user-generated content (VI-UGC), we built a very large and unique subjective image quality and distortion dataset. This new perceptual resource, which we call the LIVE-Meta VI-UGC Database, contains 40K real-world distorted VI-UGC images and 40K patches, on which we recorded 2.7M human perceptual quality judgments and 2.7M distortion labels. Using this psychometric resource we also created an automatic limited vision picture quality and distortion predictor that learns local-to-global spatial quality relationships, achieving state-of-the-art prediction performance on VI-UGC pictures, significantly outperforming existing picture quality models on this unique class of distorted picture data. We also created a prototype feedback system that helps to guide users to mitigate quality issues and take better quality pictures, by creating a multi-task learning framework. The dataset and models can be accessed at: https://github.com/mandal-cv/visimpaired.

Subject(s)

Image Processing, Computer-Assisted , Semantics , Visually Impaired Persons , Humans , Image Processing, Computer-Assisted/methods , Color Perception , Visual Acuity

8.

Study of Subjective and Objective Quality Assessment of Mobile Cloud Gaming Videos.

Saha, Avinab; Chen, Yu-Chih; Davis, Chase; Qiu, Bo; Wang, Xiaoming; Gowda, Rahul; Katsavounidis, Ioannis; Bovik, Alan C.

IEEE Trans Image Process ; 32: 3295-3310, 2023.

Article in English | MEDLINE | ID: mdl-37276105

ABSTRACT

We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html.

9.

FOVQA: Blind Foveated Video Quality Assessment.

Jin, Yize; Patney, Anjul; Webb, Richard; Bovik, Alan C.

IEEE Trans Image Process ; 31: 4571-4584, 2022.

Article in English | MEDLINE | ID: mdl-35767478

ABSTRACT

Previous blind or No Reference (NR) Image / video quality assessment (IQA/VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and the concept of space-variant quality assessment is of interest, given the availability of increasingly high spatial and temporal resolution contents and practical ways of measuring gaze direction. Distortions from foveated video compression increase with increased eccentricity, implying that the natural scene statistics are space-variant. Towards advancing the development of foveated compression / streaming algorithms, we have devised a no-reference (NR) foveated video quality assessment model, called FOVQA, which is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS). Specifically, we deploy a space-variant generalized Gaussian distribution (SV-GGD) model and a space-variant asynchronous generalized Gaussian distribution (SV-AGGD) model of mean subtracted contrast normalized (MSCN) coefficients and products of neighboring MSCN coefficients, respectively. We devise a foveated video quality predictor that extracts radial basis features, and other features that capture perceptually annoying rapid quality fall-offs. We find that FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database, as compared with other leading Foveated IQA / VQA models. we have made our implementation of FOVQA available at: https://live.ece.utexas.edu/research/Quality/FOVQA.zip.

Subject(s)

Algorithms , Data Compression , Attention , Normal Distribution , Video Recording/methods

10.

Image Quality Assessment Using Contrastive Learning.

Madhusudana, Pavan C; Birkbeck, Neil; Wang, Yilin; Adsumilli, Balu; Bovik, Alan C.

IEEE Trans Image Process ; 31: 4149-4161, 2022.

Article in English | MEDLINE | ID: mdl-35700254

ABSTRACT

We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at https://github.com/pavancm/CONTRIQUE.

11.

Video Quality Model of Compression, Resolution and Frame Rate Adaptation Based on Space-Time Regularities.

Lee, Dae Yeol; Kim, Jongho; Ko, Hyunsuk; Bovik, Alan C.

IEEE Trans Image Process ; 31: 3644-3656, 2022.

Article in English | MEDLINE | ID: mdl-35576411

ABSTRACT

Being able to accurately predict the visual quality of videos subjected to various combinations of dimension reduction protocols is of high interest to the streaming video industry, given rapid increases in frame resolutions and frame rates. In this direction, we have developed a video quality predictor that is sensitive to spatial, temporal, or space-time subsampling combined with compression. Our predictor is based on new models of space-time natural video statistics (NVS). Specifically, we model the statistics of divisively normalized difference between neighboring frames that are relatively displaced. In an extensive empirical study, we found that those paths of space-time displaced frame differences that provide maximal regularity against our NVS model generally align best with motion trajectories. Motivated by this, we built a new video quality prediction engine that extracts NVS features that represent how space-time directional regularities are disturbed by space-time distortions. Based on parametric models of these regularities, we compute features that are used to train a regressor that can accurately predict perceptual quality. As a stringent test of the new model, we apply it to the difficult problem of predicting the quality of videos subjected not only to compression, but also to downsampling in space and/or time. We show that the new quality model achieves state-of-the-art (SOTA) prediction performance on the new ETRI-LIVE Space-Time Subsampled Video Quality (STSVQ) and also on the AVT-VQDB-UHD-1 database.

12.

Study of the Subjective and Objective Quality of High Motion Live Streaming Videos.

Shang, Zaixi; Ebenezer, Joshua Peter; Wu, Yongjun; Wei, Hai; Sethuraman, Sriram; Bovik, Alan C.

IEEE Trans Image Process ; 31: 1027-1041, 2022.

Article in English | MEDLINE | ID: mdl-34951848

ABSTRACT

Video livestreaming is gaining prevalence among video streaming service s, especially for the delivery of live, high motion content such as sport ing events. The quality of the se livestreaming videos can be adversely affected by any of a wide variety of events, including capture artifacts, and distortions incurred during coding and transmission. High motion content can cause or exacerbate many kinds of distortion, such as motion blur and stutter. Because of this, the development of objective Video Quality Assessment (VQA) algorithms that can predict the perceptual quality of high motion, live streamed videos is greatly desired. Important resources for developing these algorithms are appropriate databases that exemplify the kinds of live streaming video distortions encountered in practice. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Livestream Database. The LIVE Livestream Database includes 315 videos of 45 source sequences from 33 original contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. We envision that researchers will find the dataset to be useful for the development, testing, and comparison of future VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece. utexas.edu/research/LIVE_APV_Study/apv_index.html.

Subject(s)

Algorithms , Artifacts , Databases, Factual , Humans , Motion , Video Recording

13.

A Subjective and Objective Study of Space-Time Subsampled Video Quality.

Lee, Dae Yeol; Paul, Somdyuti; Bampis, Christos G; Ko, Hyunsuk; Kim, Jongho; Jeong, Se Yoon; Homan, Blake; Bovik, Alan C.

IEEE Trans Image Process ; 31: 934-948, 2022.

Article in English | MEDLINE | ID: mdl-34965209

ABSTRACT

Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compare the performance of leading video quality models on the new database. The new ETRI-LIVE STSVQ database is being made freely available at (https://live.ece.utexas.edu/research/ETRI-LIVE_STSVQ/index.html).

14.

Self-Supervised Learning of Perceptually Optimized Block Motion Estimates for Video Compression.

Paul, Somdyuti; Norkin, Andrey; Bovik, Alan C.

IEEE Trans Image Process ; PP2022 Dec 27.

Article in English | MEDLINE | ID: mdl-37015500

ABSTRACT

Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.73% and -1.31% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively, as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder.

15.

ChipQA: No-Reference Video Quality Prediction via Space-Time Chips.

Ebenezer, Joshua Peter; Shang, Zaixi; Wu, Yongjun; Wei, Hai; Sethuraman, Sriram; Bovik, Alan C.

IEEE Trans Image Process ; 30: 8059-8074, 2021.

Article in English | MEDLINE | ID: mdl-34534087

ABSTRACT

We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that implicitly capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriented ST Chips based on how closely they fit parametric models of natural video statistics. We show that the parameters that describe these statistics can be used to reliably predict the quality of videos, without the need for a reference video. The proposed method implicitly models ST video naturalness, and deviations from naturalness. We train and test our model on several large VQA databases, and show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation.

16.

ST-GREED: Space-Time Generalized Entropic Differences for Frame Rate Dependent Video Quality Prediction.

Madhusudana, Pavan C; Birkbeck, Neil; Wang, Yilin; Adsumilli, Balu; Bovik, Alan C.

IEEE Trans Image Process ; 30: 7446-7457, 2021.

Article in English | MEDLINE | ID: mdl-34449359

ABSTRACT

We consider the problem of conducting frame rate dependent video quality assessment (VQA) on videos of diverse frame rates, including high frame rate (HFR) videos. More generally, we study how perceptual quality is affected by frame rate, and how frame rate and compression combine to affect perceived quality. We devise an objective VQA model called Space-Time GeneRalized Entropic Difference (GREED) which analyzes the statistics of spatial and temporal band-pass video coefficients. A generalized Gaussian distribution (GGD) is used to model band-pass responses, while entropy variations between reference and distorted videos under the GGD model are used to capture video quality variations arising from frame rate changes. The entropic differences are calculated across multiple temporal and spatial subbands, and merged using a learned regressor. We show through extensive experiments that GREED achieves state-of-the-art performance on the LIVE-YT-HFR Database when compared with existing VQA models. The features used in GREED are highly generalizable and obtain competitive performance even on standard, non-HFR VQA databases. The implementation of GREED has been made available online: https://github.com/pavancm/GREED.

17.

Predicting the Quality of Compressed Videos With Pre-Existing Distortions.

Yu, Xiangxu; Birkbeck, Neil; Wang, Yilin; Bampis, Christos G; Adsumilli, Balu; Bovik, Alan C.

IEEE Trans Image Process ; 30: 7511-7526, 2021.

Article in English | MEDLINE | ID: mdl-34460374

ABSTRACT

Because of the increasing ease of video capture, many millions of consumers create and upload large volumes of User-Generated-Content (UGC) videos to social and streaming media sites over the Internet. UGC videos are commonly captured by naive users having limited skills and imperfect techniques, and tend to be afflicted by mixtures of highly diverse in-capture distortions. These UGC videos are then often uploaded for sharing onto cloud servers, where they are further compressed for storage and transmission. Our paper tackles the highly practical problem of predicting the quality of compressed videos (perhaps during the process of compression, to help guide it), with only (possibly severely) distorted UGC videos as references. To address this problem, we have developed a novel Video Quality Assessment (VQA) framework that we call 1stepVQA (to distinguish it from two-step methods that we discuss). 1stepVQA overcomes limitations of Full-Reference, Reduced-Reference and No-Reference VQA models by exploiting the statistical regularities of both natural videos and distorted videos. We also describe a new dedicated video database, which was created by applying a realistic VMAF-Guided perceptual rate distortion optimization (RDO) criterion to create realistically compressed versions of UGC source videos, which typically have pre-existing distortions. We show that 1stepVQA is able to more accurately predict the quality of compressed videos, given imperfect reference videos, and outperforms other VQA models in this scenario.

18.

On the space-time statistics of motion pictures.

Lee, Dae Yeol; Ko, Hyunsuk; Kim, Jongho; Bovik, Alan C.

J Opt Soc Am A Opt Image Sci Vis ; 38(7): 908-923, 2021 Jul 01.

Article in English | MEDLINE | ID: mdl-34263746

ABSTRACT

It is well known that natural images possess statistical regularities that can be captured by bandpass decomposition and divisive normalization processes that approximate early neural processing in the human visual system. We expand on these studies and present new findings on the properties of space-time natural statistics that are inherent in motion pictures. Our model relies on the concept of temporal bandpass (e.g., lag) filtering in lateral geniculate nucleus (LGN) and area V1, which is similar to smoothed frame differencing of video frames. Specifically, we model the statistics of the differences between adjacent or neighboring video frames that have been slightly spatially displaced relative to one another. We find that when these space-time differences are further subjected to locally pooled divisive normalization, statistical regularities (or lack thereof) arise that depend on the local motion trajectory. We find that bandpass and divisively normalized frame differences that are displaced along the motion direction exhibit stronger statistical regularities than for other displacements. Conversely, the direction-dependent regularities of displaced frame differences can be used to estimate the image motion (optical flow) by finding the space-time displacement paths that best preserve statistical regularity.

Subject(s)

Primary Visual Cortex , Visual Perception , Humans , Motion Perception , Neurons

19.

Subjective and Objective Quality Assessment of 2D and 3D Foveated Video Compression in Virtual Reality.

Jin, Yize; Chen, Meixu; Goodall, Todd; Patney, Anjul; Bovik, Alan C.

IEEE Trans Image Process ; 30: 5905-5919, 2021.

Article in English | MEDLINE | ID: mdl-34125674

ABSTRACT

In Virtual Reality (VR), the requirements of much higher resolution and smooth viewing experiences under rapid and often real-time changes in viewing direction, leads to significant challenges in compression and communication. To reduce the stresses of very high bandwidth consumption, the concept of foveated video compression is being accorded renewed interest. By exploiting the space-variant property of retinal visual acuity, foveation has the potential to substantially reduce video resolution in the visual periphery, with hardly noticeable perceptual quality degradations. Accordingly, foveated image / video quality predictors are also becoming increasingly important, as a practical way to monitor and control future foveated compression algorithms. Towards advancing the development of foveated image / video quality assessment (FIQA / FVQA) algorithms, we have constructed 2D and (stereoscopic) 3D VR databases of foveated / compressed videos, and conducted a human study of perceptual quality on each database. Each database includes 10 reference videos and 180 foveated videos, which were processed by 3 levels of foveation on the reference videos. Foveation was applied by increasing compression with increased eccentricity. In the 2D study, each video was of resolution 7680×3840 and was viewed and quality-rated by 36 subjects, while in the 3D study, each video was of resolution 5376×5376 and rated by 34 subjects. Both studies were conducted on top of a foveated video player having low motion-to-photon latency (~50ms). We evaluated different objective image and video quality assessment algorithms, including both FIQA / FVQA algorithms and non-foveated algorithms, on our so called LIVE-Facebook Technologies Foveation-Compressed Virtual Reality (LIVE-FBT-FCVR) databases. We also present a statistical evaluation of the relative performances of these algorithms. The LIVE-FBT-FCVR databases have been made publicly available and can be accessed at https://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html.

20.

Towards Perceptually Optimized Adaptive Video Streaming-A Realistic Quality of Experience Database.

Bampis, Christos G; Li, Zhi; Katsavounidis, Ioannis; Huang, Te-Yuan; Ekanadham, Chaitanya; Bovik, Alan C.

IEEE Trans Image Process ; 30: 5182-5197, 2021.

Article in English | MEDLINE | ID: mdl-33877974

ABSTRACT

Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and objective QoE prediction models. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device. The new database focuses on low bandwidth conditions which are more challenging for bitrate adaptation algorithms, which often must navigate tradeoffs between rebuffering and video quality. Using our database, we study the effects of multiple streaming dimensions on user experience and evaluate video quality and quality of experience models and analyze their strengths and weaknesses. We believe that the tools introduced here will help inspire further progress on the development of perceptually-optimized client adaptation and video streaming strategies. The database is publicly available at http://live.ece.utexas.edu/research/LIVE_NFLX_II/live_nflx_plus.html.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL