Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Opt Soc Am A Opt Image Sci Vis ; 38(7): 908-923, 2021 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-34263746

RESUMEN

It is well known that natural images possess statistical regularities that can be captured by bandpass decomposition and divisive normalization processes that approximate early neural processing in the human visual system. We expand on these studies and present new findings on the properties of space-time natural statistics that are inherent in motion pictures. Our model relies on the concept of temporal bandpass (e.g., lag) filtering in lateral geniculate nucleus (LGN) and area V1, which is similar to smoothed frame differencing of video frames. Specifically, we model the statistics of the differences between adjacent or neighboring video frames that have been slightly spatially displaced relative to one another. We find that when these space-time differences are further subjected to locally pooled divisive normalization, statistical regularities (or lack thereof) arise that depend on the local motion trajectory. We find that bandpass and divisively normalized frame differences that are displaced along the motion direction exhibit stronger statistical regularities than for other displacements. Conversely, the direction-dependent regularities of displaced frame differences can be used to estimate the image motion (optical flow) by finding the space-time displacement paths that best preserve statistical regularity.


Asunto(s)
Corteza Visual Primaria , Percepción Visual , Humanos , Percepción de Movimiento , Neuronas
2.
J Vis ; 17(1): 32, 2017 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28129417

RESUMEN

Current top-performing blind perceptual image quality prediction models are generally trained on legacy databases of human quality opinion scores on synthetically distorted images. Therefore, they learn image features that effectively predict human visual quality judgments of inauthentic and usually isolated (single) distortions. However, real-world images usually contain complex composite mixtures of multiple distortions. We study the perceptually relevant natural scene statistics of such authentically distorted images in different color spaces and transform domains. We propose a "bag of feature maps" approach that avoids assumptions about the type of distortion(s) contained in an image and instead focuses on capturing consistencies-or departures therefrom-of the statistics of real-world images. Using a large database of authentically distorted images, human opinions of them, and bags of features computed on them, we train a regressor to conduct image quality prediction. We demonstrate the competence of the features toward improving automatic perceptual quality prediction by testing a learned algorithm using them on a benchmark legacy database as well as on a newly introduced distortion-realistic resource called the LIVE In the Wild Image Quality Challenge Database. We extensively evaluate the perceptual quality prediction model and algorithm and show that it is able to achieve good-quality prediction power that is better than other leading models.


Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Distorsión de la Percepción/fisiología , Fotograbar/métodos , Bases de Datos Factuales , Humanos , Modelos Teóricos , Reproducibilidad de los Resultados
3.
J Vis ; 17(5): 22, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28564686

RESUMEN

Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world that the vision system likely exploits to compute perceived depth, monocularly as well as binocularly. Toward understanding how this might be accomplished, we propose a Bayesian model of monocular depth computation that recovers detailed 3D scene structures by extracting reliable, robust, depth-sensitive statistical features from single natural images. These features are derived using well-accepted univariate natural scene statistics (NSS) models and recent bivariate/correlation NSS models that describe the relationships between 2D photographic images and their associated depth maps. This is accomplished by building a dictionary of canonical local depth patterns from which NSS features are extracted as prior information. The dictionary is used to create a multivariate Gaussian mixture (MGM) likelihood model that associates local image features with depth patterns. A simple Bayesian predictor is then used to form spatial depth estimates. The depth results produced by the model, despite its simplicity, correlate well with ground-truth depths measured by a current-generation terrestrial light detection and ranging (LIDAR) scanner. Such a strong form of statistical depth information could be used by the visual system when creating overall estimated depth maps incorporating stereopsis, accommodation, and other conditions. Indeed, even in isolation, the Bayesian predictor delivers depth estimates that are competitive with state-of-the-art "computer vision" methods that utilize highly engineered image features and sophisticated machine learning algorithms.


Asunto(s)
Teorema de Bayes , Percepción de Profundidad/fisiología , Imagenología Tridimensional , Modelos Teóricos , Algoritmos , Humanos , Funciones de Verosimilitud
4.
J Vis ; 16(5): 19, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27019052

RESUMEN

The now well-known motion-silencing illusion has shown that salient changes among a group of objects' luminances, colors, shapes, or sizes may appear to cease when objects move rapidly (Suchow & Alvarez, 2011). It has been proposed that silencing derives from dot spacing that causes crowding, coherent changes in object color or size, and flicker frequencies combined with dot spacing (Choi, Bovik, & Cormack, 2014; Peirce, 2013; Turi & Burr, 2013). Motion silencing is a peripheral effect that does not occur near the point of fixation. To better understand the effect of eccentricity on motion silencing, we measured the amount of motion silencing as a function of eccentricity in human observers using traditional psychophysics. Fifteen observers reported whether dots in any of four concentric rings changed in luminance over a series of rotational velocities. The results in the human experiments showed that the threshold velocity for motion silencing almost linearly decreases as a function of log eccentricity. Further, we modeled the response of a population of simulated V1 neurons to our stimuli. We found strong matches between the threshold velocities on motion silencing observed in the human experiment and those seen in the energy model of Adelson and Bergen (1985). We suggest the plausible explanation that as eccentricity increases, the combined motion-flicker signal falls outside the narrow spatiotemporal frequency response regions of the modeled receptive fields, thereby reducing flicker visibility.


Asunto(s)
Percepción de Movimiento/fisiología , Adulto , Percepción de Color/fisiología , Aglomeración , Femenino , Humanos , Masculino , Neuronas/fisiología , Psicofísica , Visión Ocular , Adulto Joven
5.
BMC Med Imaging ; 15: 12, 2015 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-25885763

RESUMEN

BACKGROUND: Patients with facial cancers can experience disfigurement as they may undergo considerable appearance changes from their illness and its treatment. Individuals with difficulties adjusting to facial cancer are concerned about how others perceive and evaluate their appearance. Therefore, it is important to understand how humans perceive disfigured faces. We describe a new strategy that allows simulation of surgically plausible facial disfigurement on a novel face for elucidating the human perception on facial disfigurement. METHOD: Longitudinal 3D facial images of patients (N = 17) with facial disfigurement due to cancer treatment were replicated using a facial mannequin model, by applying Thin-Plate Spline (TPS) warping and linear interpolation on the facial mannequin model in polar coordinates. Principal Component Analysis (PCA) was used to capture longitudinal structural and textural variations found within each patient with facial disfigurement arising from the treatment. We treated such variations as disfigurement. Each disfigurement was smoothly stitched on a healthy face by seeking a Poisson solution to guided interpolation using the gradient of the learned disfigurement as the guidance field vector. The modeling technique was quantitatively evaluated. In addition, panel ratings of experienced medical professionals on the plausibility of simulation were used to evaluate the proposed disfigurement model. RESULTS: The algorithm reproduced the given face effectively using a facial mannequin model with less than 4.4 mm maximum error for the validation fiducial points that were not used for the processing. Panel ratings of experienced medical professionals on the plausibility of simulation showed that the disfigurement model (especially for peripheral disfigurement) yielded predictions comparable to the real disfigurements. CONCLUSIONS: The modeling technique of this study is able to capture facial disfigurements and its simulation represents plausible outcomes of reconstructive surgery for facial cancers. Thus, our technique can be used to study human perception on facial disfigurement.


Asunto(s)
Traumatismos Faciales/etiología , Traumatismos Faciales/patología , Neoplasias Faciales/patología , Neoplasias Faciales/cirugía , Imagenología Tridimensional/métodos , Procedimientos de Cirugía Plástica/efectos adversos , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Simulación por Computador , Cara/patología , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Masculino , Persona de Mediana Edad , Modelos Biológicos , Cuidados Preoperatorios/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Resultado del Tratamiento , Adulto Joven
6.
J Digit Imaging ; 27(2): 248-54, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24190140

RESUMEN

The purpose of this study was to evaluate stereoscopic perception of low-dose breast tomosynthesis projection images. In this Institutional Review Board exempt study, craniocaudal breast tomosynthesis cases (N = 47), consisting of 23 biopsy-proven malignant mass cases and 24 normal cases, were retrospectively reviewed. A stereoscopic pair comprised of two projection images that were ±4° apart from the zero angle projection was displayed on a Planar PL2010M stereoscopic display (Planar Systems, Inc., Beaverton, OR, USA). An experienced breast imager verified the truth for each case stereoscopically. A two-phase blinded observer study was conducted. In the first phase, two experienced breast imagers rated their ability to perceive 3D information using a scale of 1-3 and described the most suspicious lesion using the BI-RADS® descriptors. In the second phase, four experienced breast imagers were asked to make a binary decision on whether they saw a mass for which they would initiate a diagnostic workup or not and also report the location of the mass and provide a confidence score in the range of 0-100. The sensitivity and the specificity of the lesion detection task were evaluated. The results from our study suggest that radiologists who can perceive stereo can reliably interpret breast tomosynthesis projection images using stereoscopic viewing.


Asunto(s)
Enfermedades de la Mama/diagnóstico por imagen , Intensificación de Imagen Radiográfica/métodos , Biopsia , Femenino , Humanos , Imagenología Tridimensional , Mamografía/métodos , Dosis de Radiación , Estudios Retrospectivos , Sensibilidad y Especificidad , Encuestas y Cuestionarios
7.
IEEE Trans Image Process ; 33: 466-478, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38150345

RESUMEN

Effectively evaluating the perceptual quality of dehazed images remains an under-explored research issue. In this paper, we propose a no-reference complex-valued convolutional neural network (CV-CNN) model to conduct automatic dehazed image quality evaluation. Specifically, a novel CV-CNN is employed that exploits the advantages of complex-valued representations, achieving better generalization capability on perceptual feature learning than real-valued ones. To learn more discriminative features to analyze the perceptual quality of dehazed images, we design a dual-stream CV-CNN architecture. The dual-stream model comprises a distortion-sensitive stream that operates on the dehazed RGB image, and a haze-aware stream on a novel dark channel difference image. The distortion-sensitive stream accounts for perceptual distortion artifacts, while the haze-aware stream addresses the possible presence of residual haze. Experimental results on three publicly available dehazed image quality assessment (DQA) databases demonstrate the effectiveness and generalization of our proposed CV-CNN DQA model as compared to state-of-the-art no-reference image quality assessment algorithms.

8.
IEEE Trans Image Process ; 33: 3606-3619, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38814774

RESUMEN

We conducted a large-scale study of human perceptual quality judgments of High Dynamic Range (HDR) and Standard Dynamic Range (SDR) videos subjected to scaling and compression levels and viewed on three different display devices. While conventional expectations are that HDR quality is better than SDR quality, we have found subject preference of HDR versus SDR depends heavily on the display device, as well as on resolution scaling and bitrate. To study this question, we collected more than 23,000 quality ratings from 67 volunteers who watched 356 videos on OLED, QLED, and LCD televisions, and among many other findings, observed that HDR videos were often rated as lower quality than SDR videos at lower bitrates, particularly when viewed on LCD and QLED displays. Since it is of interest to be able to measure the quality of videos under these scenarios, e.g. to inform decisions regarding scaling, compression, and SDR vs HDR, we tested several well-known full-reference and no-reference video quality models on the new database. Towards advancing progress on this problem, we also developed a novel no-reference model called HDRPatchMAX, that uses a contrast-based analysis of classical and bit-depth features to predict quality more accurately than existing metrics.

9.
IEEE Trans Image Process ; 33: 42-57, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-37988212

RESUMEN

As compared to standard dynamic range (SDR) videos, high dynamic range (HDR) content is able to represent and display much wider and more accurate ranges of brightness and color, leading to more engaging and enjoyable visual experiences. HDR also implies increases in data volume, further challenging existing limits on bandwidth consumption and on the quality of delivered content. Perceptual quality models are used to monitor and control the compression of streamed SDR content. A similar strategy should be useful for HDR content, yet there has been limited work on building HDR video quality assessment (VQA) algorithms. One reason for this is a scarcity of high-quality HDR VQA databases representative of contemporary HDR standards. Towards filling this gap, we created the first publicly available HDR VQA database dedicated to HDR10 videos, called the Laboratory for Image and Video Engineering (LIVE) HDR Database. It comprises 310 videos from 31 distinct source sequences processed by ten different compression and resolution combinations, simulating bitrate ladders used by the streaming industry. We used this data to conduct a subjective quality study, gathering more than 20,000 human quality judgments under two different illumination conditions. To demonstrate the usefulness of this new psychometric data resource, we also designed a new framework for creating HDR quality sensitive features, using a nonlinear transform to emphasize distortions occurring in spatial portions of videos that are enhanced by HDR, e.g., having darker blacks and brighter whites. We apply this new method, which we call HDRMAX, to modify the widely-deployed Video Multimethod Assessment Fusion (VMAF) model. We show that VMAF+HDRMAX provides significantly elevated performance on both HDR and SDR videos, exceeding prior state-of-the-art model performance. The database is now accessible at: https://live.ece.utexas.edu/research/LIVEHDR/LIVEHDR_index.html. The model will be made available at a later date at: https://live.ece.utexas.edu//research/Quality/index_algorithms.htm.

10.
IEEE Trans Image Process ; 32: 3873-3884, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37432828

RESUMEN

Perception-based image analysis technologies can be used to help visually impaired people take better quality pictures by providing automated guidance, thereby empowering them to interact more confidently on social media. The photographs taken by visually impaired users often suffer from one or both of two kinds of quality issues: technical quality (distortions), and semantic quality, such as framing and aesthetic composition. Here we develop tools to help them minimize occurrences of common technical distortions, such as blur, poor exposure, and noise. We do not address the complementary problems of semantic quality, leaving that aspect for future work. The problem of assessing, and providing actionable feedback on the technical quality of pictures captured by visually impaired users is hard enough, owing to the severe, commingled distortions that often occur. To advance progress on the problem of analyzing and measuring the technical quality of visually impaired user-generated content (VI-UGC), we built a very large and unique subjective image quality and distortion dataset. This new perceptual resource, which we call the LIVE-Meta VI-UGC Database, contains 40K real-world distorted VI-UGC images and 40K patches, on which we recorded 2.7M human perceptual quality judgments and 2.7M distortion labels. Using this psychometric resource we also created an automatic limited vision picture quality and distortion predictor that learns local-to-global spatial quality relationships, achieving state-of-the-art prediction performance on VI-UGC pictures, significantly outperforming existing picture quality models on this unique class of distorted picture data. We also created a prototype feedback system that helps to guide users to mitigate quality issues and take better quality pictures, by creating a multi-task learning framework. The dataset and models can be accessed at: https://github.com/mandal-cv/visimpaired.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Semántica , Personas con Daño Visual , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Percepción de Color , Agudeza Visual
11.
IEEE Trans Image Process ; 32: 5138-5152, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37676804

RESUMEN

Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model against leading algorithms on multiple VQA databases containing wide ranges of spatial and temporal distortions. We analyze the correlations between model predictions and ground-truth quality ratings, and show that CONVIQT achieves competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.

12.
Artículo en Inglés | MEDLINE | ID: mdl-38150347

RESUMEN

The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a state-of-the-art approach to video quality prediction, that now pervades the streaming and social media industry. However, since VMAF requires the evaluation of a heterogeneous set of quality models, it is computationally expensive. Given other advances in hardware-accelerated encoding, quality assessment is emerging as a significant bottleneck in video compression pipelines. Towards alleviating this burden, we propose a novel Fusion of Unified Quality Evaluators (FUNQUE) framework, by enabling computation sharing and by using a transform that is sensitive to visual perception to boost accuracy. Further, we expand the FUNQUE framework to define a collection of improved low-complexity fused-feature models that advance the state-of-the-art of video quality performance with respect to both accuracy, by 4.2% to 5.3%, and computational efficiency, by factors of 3.8 to 11 times!.

13.
Crit Care Clin ; 39(4): 675-687, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37704333

RESUMEN

Perioperative morbidity and mortality are significantly associated with both static and dynamic perioperative factors. The studies investigating static perioperative factors have been reported; however, there are a limited number of previous studies and data sets analyzing dynamic perioperative factors, including physiologic waveforms, despite its clinical importance. To fill the gap, the authors introduce a novel large size perioperative data set: Machine Learning Of physiologic waveforms and electronic health Record Data (MLORD) data set. They also provide a concise tutorial on machine learning to illustrate predictive models trained on complex and diverse structures in the MLORD data set.


Asunto(s)
Registros Electrónicos de Salud , Aprendizaje Automático , Humanos , Relevancia Clínica
14.
IEEE Trans Image Process ; 32: 3295-3310, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37276105

RESUMEN

We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html.

15.
IEEE Trans Image Process ; 31: 4571-4584, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35767478

RESUMEN

Previous blind or No Reference (NR) Image / video quality assessment (IQA/VQA) models largely rely on features drawn from natural scene statistics (NSS), but under the assumption that the image statistics are stationary in the spatial domain. Several of these models are quite successful on standard pictures. However, in Virtual Reality (VR) applications, foveated video compression is regaining attention, and the concept of space-variant quality assessment is of interest, given the availability of increasingly high spatial and temporal resolution contents and practical ways of measuring gaze direction. Distortions from foveated video compression increase with increased eccentricity, implying that the natural scene statistics are space-variant. Towards advancing the development of foveated compression / streaming algorithms, we have devised a no-reference (NR) foveated video quality assessment model, called FOVQA, which is based on new models of space-variant natural scene statistics (NSS) and natural video statistics (NVS). Specifically, we deploy a space-variant generalized Gaussian distribution (SV-GGD) model and a space-variant asynchronous generalized Gaussian distribution (SV-AGGD) model of mean subtracted contrast normalized (MSCN) coefficients and products of neighboring MSCN coefficients, respectively. We devise a foveated video quality predictor that extracts radial basis features, and other features that capture perceptually annoying rapid quality fall-offs. We find that FOVQA achieves state-of-the-art (SOTA) performance on the new 2D LIVE-FBT-FCVR database, as compared with other leading Foveated IQA / VQA models. we have made our implementation of FOVQA available at: https://live.ece.utexas.edu/research/Quality/FOVQA.zip.


Asunto(s)
Algoritmos , Compresión de Datos , Atención , Distribución Normal , Grabación en Video/métodos
16.
Artículo en Inglés | MEDLINE | ID: mdl-37015500

RESUMEN

Block based motion estimation is integral to inter prediction processes performed in hybrid video codecs. Prevalent block matching based methods that are used to compute block motion vectors (MVs) rely on computationally intensive search procedures. They also suffer from the aperture problem, which tends to worsen as the block size is reduced. Moreover, the block matching criteria used in typical codecs do not account for the resulting levels of perceptual quality of the motion compensated pictures that are created upon decoding. Towards achieving the elusive goal of perceptually optimized motion estimation, we propose a search-free block motion estimation framework using a multi-stage convolutional neural network, which is able to conduct motion estimation on multiple block sizes simultaneously, using a triplet of frames as input. This composite block translation network (CBT-Net) is trained in a self-supervised manner on a large database that we created from publicly available uncompressed video content. We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames. Our experimental results highlight the computational efficiency of our proposed model relative to conventional block matching based motion estimation algorithms, for comparable prediction errors. Further, when used to perform inter prediction in AV1, the MV predictions of the perceptually optimized model result in average Bjontegaard-delta rate (BD-rate) improvements of -1.73% and -1.31% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion (VMAF) quality metrics, respectively, as compared to the block matching based motion estimation system employed in the SVT-AV1 encoder.

17.
IEEE Trans Image Process ; 31: 3644-3656, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35576411

RESUMEN

Being able to accurately predict the visual quality of videos subjected to various combinations of dimension reduction protocols is of high interest to the streaming video industry, given rapid increases in frame resolutions and frame rates. In this direction, we have developed a video quality predictor that is sensitive to spatial, temporal, or space-time subsampling combined with compression. Our predictor is based on new models of space-time natural video statistics (NVS). Specifically, we model the statistics of divisively normalized difference between neighboring frames that are relatively displaced. In an extensive empirical study, we found that those paths of space-time displaced frame differences that provide maximal regularity against our NVS model generally align best with motion trajectories. Motivated by this, we built a new video quality prediction engine that extracts NVS features that represent how space-time directional regularities are disturbed by space-time distortions. Based on parametric models of these regularities, we compute features that are used to train a regressor that can accurately predict perceptual quality. As a stringent test of the new model, we apply it to the difficult problem of predicting the quality of videos subjected not only to compression, but also to downsampling in space and/or time. We show that the new quality model achieves state-of-the-art (SOTA) prediction performance on the new ETRI-LIVE Space-Time Subsampled Video Quality (STSVQ) and also on the AVT-VQDB-UHD-1 database.

18.
IEEE Trans Image Process ; 31: 4149-4161, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35700254

RESUMEN

We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at https://github.com/pavancm/CONTRIQUE.

19.
IEEE Trans Image Process ; 31: 1027-1041, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34951848

RESUMEN

Video livestreaming is gaining prevalence among video streaming service s, especially for the delivery of live, high motion content such as sport ing events. The quality of the se livestreaming videos can be adversely affected by any of a wide variety of events, including capture artifacts, and distortions incurred during coding and transmission. High motion content can cause or exacerbate many kinds of distortion, such as motion blur and stutter. Because of this, the development of objective Video Quality Assessment (VQA) algorithms that can predict the perceptual quality of high motion, live streamed videos is greatly desired. Important resources for developing these algorithms are appropriate databases that exemplify the kinds of live streaming video distortions encountered in practice. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Livestream Database. The LIVE Livestream Database includes 315 videos of 45 source sequences from 33 original contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. We envision that researchers will find the dataset to be useful for the development, testing, and comparison of future VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece. utexas.edu/research/LIVE_APV_Study/apv_index.html.


Asunto(s)
Algoritmos , Artefactos , Bases de Datos Factuales , Humanos , Movimiento (Física) , Grabación en Video
20.
IEEE Trans Image Process ; 31: 934-948, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34965209

RESUMEN

Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compare the performance of leading video quality models on the new database. The new ETRI-LIVE STSVQ database is being made freely available at (https://live.ece.utexas.edu/research/ETRI-LIVE_STSVQ/index.html).

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA