Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
J Vis ; 22(4): 12, 2022 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-35323868

RESUMO

Central and peripheral vision during visual tasks have been extensively studied on two-dimensional screens, highlighting their perceptual and functional disparities. This study has two objectives: replicating on-screen gaze-contingent experiments removing central or peripheral field of view in virtual reality, and identifying visuo-motor biases specific to the exploration of 360 scenes with a wide field of view. Our results are useful for vision modelling, with applications in gaze position prediction (e.g., content compression and streaming). We ask how previous on-screen findings translate to conditions where observers can use their head to explore stimuli. We implemented a gaze-contingent paradigm to simulate loss of vision in virtual reality, participants could freely view omnidirectional natural scenes. This protocol allows the simulation of vision loss with an extended field of view (\(\gt \)80°) and studying the head's contributions to visual attention. The time-course of visuo-motor variables in our pure free-viewing task reveals long fixations and short saccades during first seconds of exploration, contrary to literature in visual tasks guided by instructions. We show that the effect of vision loss is reflected primarily on eye movements, in a manner consistent with two-dimensional screens literature. We hypothesize that head movements mainly serve to explore the scenes during free-viewing, the presence of masks did not significantly impact head scanning behaviours. We present new fixational and saccadic visuo-motor tendencies in a 360° context that we hope will help in the creation of gaze prediction models dedicated to virtual reality.


Assuntos
Fixação Ocular , Realidade Virtual , Movimentos Oculares , Humanos , Movimentos Sacádicos , Percepção Visual
2.
J Vis ; 19(14): 22, 2019 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-31868896

RESUMO

Visual field defects are a world-wide concern, and the proportion of the population experiencing vision loss is ever increasing. Macular degeneration and glaucoma are among the four leading causes of permanent vision loss. Identifying and characterizing visual field losses from gaze alone could prove crucial in the future for screening tests, rehabilitation therapies, and monitoring. In this experiment, 54 participants took part in a free-viewing task of visual scenes while experiencing artificial scotomas (central and peripheral) of varying radii in a gaze-contingent paradigm. We studied the importance of a set of gaze features as predictors to best differentiate between artificial scotoma conditions. Linear mixed models were utilized to measure differences between scotoma conditions. Correlation and factorial analyses revealed redundancies in our data. Finally, hidden Markov models and recurrent neural networks were implemented as classifiers in order to measure the predictive usefulness of gaze features. The results show separate saccade direction biases depending on scotoma type. We demonstrate that the saccade relative angle, amplitude, and peak velocity of saccades are the best features on the basis of which to distinguish between artificial scotomas in a free-viewing task. Finally, we discuss the usefulness of our protocol and analyses as a gaze-feature identifier tool that discriminates between artificial scotomas of different types and sizes.


Assuntos
Escotoma/fisiopatologia , Testes de Campo Visual/métodos , Campos Visuais , Adulto , Cegueira , Feminino , Glaucoma/fisiopatologia , Humanos , Degeneração Macular/fisiopatologia , Masculino , Cadeias de Markov , Pessoa de Meia-Idade , Redes Neurais de Computação , Movimentos Sacádicos , Transtornos da Visão , Adulto Jovem
3.
J Opt Soc Am A Opt Image Sci Vis ; 31(5): 1112-7, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24979644

RESUMO

This paper addresses the numerical stability issue on the channelized Hotelling observer (CHO). The CHO is a well-known approach in the medical image quality assessment domain. Many researchers have found that the detection performance of the CHO does not increase with the number of channels, contrary to expectation. And to our knowledge, nobody in this domain has found the reason. We illustrated that this is due to the ill-posed problem of the scatter matrix and proposed a solution based on Tikhonov regularization. Although Tikhonov regularization has been used in many other domains, we show in this paper another important application of Tikhonov regularization. This is very important for researchers to continue the CHO (and other channelized model observer) investigation with a reliable detection performance calculation.


Assuntos
Algoritmos , Artefatos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Modelos Teóricos , Análise Numérica Assistida por Computador , Simulação por Computador
4.
Proc IEEE Inst Electr Electron Eng ; 101(9): 2058-2067, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24489403

RESUMO

Making technological advances in the field of human-machine interactions requires that the capabilities and limitations of the human perceptual system are taken into account. The focus of this report is an important mechanism of perception, visual selective attention, which is becoming more and more important for multimedia applications. We introduce the concept of visual attention and describe its underlying mechanisms. In particular, we introduce the concepts of overt and covert visual attention, and of bottom-up and top-down processing. Challenges related to modeling visual attention and their validation using ad hoc ground truth are also discussed. Examples of the usage of visual attention models in image and video processing are presented. We emphasize multimedia delivery, retargeting and quality assessment of image and video, medical imaging, and the field of stereoscopic 3D images applications.

5.
J Opt Soc Am A Opt Image Sci Vis ; 30(11): 2422-32, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-24322945

RESUMO

As a task-based approach for medical image quality assessment, model observers (MOs) have been proposed as surrogates for human observers. While most MOs treat only signal-known-exactly tasks, there are few studies on signal-known-statistically (SKS) MOs, which are clinically more relevant. In this paper, we present a new SKS MO named channelized joint detection and estimation observer (CJO), capable of detecting and estimating signals with unknown amplitude, orientation, and size. We evaluate its estimation and detection performance using both synthesized (correlated Gaussian) backgrounds and real clinical (magnetic resonance) backgrounds. The results suggest that the CJO has good performance in the SKS detection-estimation task.


Assuntos
Diagnóstico por Imagem , Processamento de Imagem Assistida por Computador/métodos , Modelos Teóricos , Processamento de Sinais Assistido por Computador , Controle de Qualidade
6.
IEEE Trans Image Process ; 31: 5456-5468, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35951566

RESUMO

Due to complex and volatile lighting environment, underwater imaging can be readily impaired by light scattering, warping, and noises. To improve the visual quality, Underwater Image Enhancement (UIE) techniques have been widely studied. Recent efforts have also been contributed to evaluate and compare the UIE performances with subjective and objective methods. However, the subjective evaluation is time-consuming and uneconomic for all images, while existing objective methods have limited capabilities for the newly-developed UIE approaches based on deep learning. To fill this gap, we propose an Underwater Image Fidelity (UIF) metric for objective evaluation of enhanced underwater images. By exploiting the statistical features of these images in CIELab space, we present the naturalness, sharpness, and structure indexes. Among them, the naturalness and sharpness indexes represent the visual improvements of enhanced images; the structure index indicates the structural similarity between the underwater images before and after UIE. We combine all indexes with a saliency-based spatial pooling and thus obtain the final UIF metric. To evaluate the proposed metric, we also establish a first-of-its-kind large-scale UIE database with subjective scores, namely Underwater Image Enhancement Database (UIED). Experimental results confirm that the proposed UIF metric outperforms a variety of underwater and general-purpose image quality metrics. The database and source code are available at https://github.com/z21110008/UIF.

7.
IEEE Trans Image Process ; 31: 1161-1175, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34990360

RESUMO

Images synthesized using depth-image-based-rendering (DIBR) techniques may suffer from complex structural distortions. The goal of the primary visual cortex and other parts of brain is to reduce redundancies of input visual signal in order to discover the intrinsic image structure, and thus create sparse image representation. Human visual system (HVS) treats images on several scales and several levels of resolution when perceiving the visual scene. With an attempt to emulate the properties of HVS, we have designed the no-reference model for the quality assessment of DIBR-synthesized views. To extract a higher-order structure of high curvature which corresponds to distortion of shapes to which the HVS is highly sensitive, we define a morphological oriented Difference of Closings (DoC) operator and use it at multiple scales and resolutions. DoC operator nonlinearly removes redundancies and extracts fine grained details, texture of an image local structure and contrast to which HVS is highly sensitive. We introduce a new feature based on sparsity of DoC band. To extract perceptually important low-order structural information (edges), we use the non-oriented Difference of Gaussians (DoG) operator at different scales and resolutions. Measure of sparsity is calculated for DoG bands to get scalar features. To model the relationship between the extracted features and subjective scores, the general regression neural network (GRNN) is used. Quality predictions by the proposed DoC-DoG-GRNN model show higher compatibility with perceptual quality scores in comparison to the tested state-of-the-art metrics when evaluated on four benchmark datasets with synthesized views, IRCCyN/IVC image/video dataset, MCL-3D stereoscopic image dataset and IST image dataset.


Assuntos
Algoritmos , Córtex Visual Primário , Imageamento Tridimensional , Redes Neurais de Computação , Distribuição Normal
8.
IEEE Trans Image Process ; 31: 7206-7221, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36367913

RESUMO

With the development of multimedia technology, Augmented Reality (AR) has become a promising next-generation mobile platform. The primary value of AR is to promote the fusion of digital contents and real-world environments, however, studies on how this fusion will influence the Quality of Experience (QoE) of these two components are lacking. To achieve better QoE of AR, whose two layers are influenced by each other, it is important to evaluate its perceptual quality first. In this paper, we consider AR technology as the superimposition of virtual scenes and real scenes, and introduce visual confusion as its basic theory. A more general problem is first proposed, which is evaluating the perceptual quality of superimposed images, i.e., confusing image quality assessment. A ConFusing Image Quality Assessment (CFIQA) database is established, which includes 600 reference images and 300 distorted images generated by mixing reference images in pairs. Then a subjective quality perception experiment is conducted towards attaining a better understanding of how humans perceive the confusing images. Based on the CFIQA database, several benchmark models and a specifically designed CFIQA model are proposed for solving this problem. Experimental results show that the proposed CFIQA model achieves state-of-the-art performance compared to other benchmark models. Moreover, an extended ARIQA study is further conducted based on the CFIQA study. We establish an ARIQA database to better simulate the real AR application scenarios, which contains 20 AR reference images, 20 background (BG) reference images, and 560 distorted images generated from AR and BG references, as well as the correspondingly collected subjective quality ratings. Three types of full-reference (FR) IQA benchmark variants are designed to study whether we should consider the visual confusion when designing corresponding IQA algorithms. An ARIQA metric is finally proposed for better evaluating the perceptual quality of AR images. Experimental results demonstrate the good generalization ability of the CFIQA model and the state-of-the-art performance of the ARIQA model. The databases, benchmark models, and proposed metrics are available at: https://github.com/DuanHuiyu/ARIQA.


Assuntos
Realidade Aumentada , Humanos , Algoritmos , Bases de Dados Factuais
9.
IEEE Trans Neural Netw Learn Syst ; 33(3): 1051-1065, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-33296311

RESUMO

Deep neural networks are vulnerable to adversarial attacks. More importantly, some adversarial examples crafted against an ensemble of source models transfer to other target models and, thus, pose a security threat to black-box applications (when attackers have no access to the target models). Current transfer-based ensemble attacks, however, only consider a limited number of source models to craft an adversarial example and, thus, obtain poor transferability. Besides, recent query-based black-box attacks, which require numerous queries to the target model, not only come under suspicion by the target model but also cause expensive query cost. In this article, we propose a novel transfer-based black-box attack, dubbed serial-minigroup-ensemble-attack (SMGEA). Concretely, SMGEA first divides a large number of pretrained white-box source models into several "minigroups." For each minigroup, we design three new ensemble strategies to improve the intragroup transferability. Moreover, we propose a new algorithm that recursively accumulates the "long-term" gradient memories of the previous minigroup to the subsequent minigroup. This way, the learned adversarial information can be preserved, and the intergroup transferability can be improved. Experiments indicate that SMGEA not only achieves state-of-the-art black-box attack ability over several data sets but also deceives two online black-box saliency prediction systems in real world, i.e., DeepGaze-II (https://deepgaze.bethgelab.org/) and SALICON (http://salicon.net/demo/). Finally, we contribute a new code repository to promote research on adversarial attack and defense over ubiquitous pixel-to-pixel computer vision tasks. We share our code together with the pretrained substitute model zoo at https://github.com/CZHQuality/AAA-Pix2pix.


Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizagem , Memória de Longo Prazo
10.
J Opt Soc Am A Opt Image Sci Vis ; 28(10): 2033-48, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21979508

RESUMO

In the context of color perception on modern wide-gamut displays with narrowband spectral primaries, we performed a theoretical analysis on various aspects of physiological observers proposed by CIE TC 1-36 (CIEPO06). We allowed certain physiological factors to vary, which was not considered in the CIEPO06 framework. For example, we analyzed that the long-wave-sensitive (LWS) or medium-wave-sensitive (MWS) peak wavelength shift in the photopigment absorption spectra, a factor not modeled in CIEPO06, contributed more toward observer variability than some of the factors considered in the model. Further, we compared the color-matching functions derived from the CIEPO06 model and the CIE 10° standard colorimetric observer to the average observer data from three distinct subgroups of Stiles-Burch observers, formed on the basis of observer ages (22-23 years, 27-29 years, and 49-50 years). The errors in predicting the x(λ) and y(λ) color-matching functions of the intragroup average observers in the long-wave range and in the medium-wave range, respectively, were generally more in the case of the CIEPO06 model compared to the 10° standard colorimetric observer and manifested in both spectral and chromaticity space. In contrast, the short-wave-sensitive z10(λ) function of the 10° standard colorimetric observer performed poorly compared to the CIEPO06 model for all three subgroups. Finally, a constrained nonlinear optimization on the CIEPO06 model outputs showed that a peak wavelength shift of photopigment density alone could not improve the model prediction errors at higher wavelengths. As an alternative, two optimized weighting functions for each of the LWS and MWS cone photopigment densities led to significant improvement in the prediction of intra-age-group average data for both the 22-23 year and 49-50 year age groups. We hypothesize that the assumption in the CIEPO06 model that the peak optical density of visual pigments does not vary with age is false and is the source of these prediction errors at higher wavelengths. Correcting these errors in the model can lead to an improved age-dependent observer and can also help update the current CIE 10° standard colorimetric observer. Accordingly, it would reduce the discrepancies between color matches with broadband spectral primaries and color matches with narrowband spectral primaries.


Assuntos
Envelhecimento/fisiologia , Percepção de Cores/fisiologia , Colorimetria/métodos , Agências Internacionais , Modelos Biológicos , Observação , Fatores Etários , Colorimetria/normas , Humanos , Agências Internacionais/normas , Pessoa de Meia-Idade , Estimulação Luminosa , Padrões de Referência , Análise Espectral , Adulto Jovem
11.
J Opt Soc Am A Opt Image Sci Vis ; 28(2): 157-88, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21293521

RESUMO

Quality estimators aspire to quantify the perceptual resemblance, but not the usefulness, of a distorted image when compared to a reference natural image. However, humans can successfully accomplish tasks (e.g., object identification) using visibly distorted images that are not necessarily of high quality. A suite of novel subjective experiments reveals that quality does not accurately predict utility (i.e., usefulness). Thus, even accurate quality estimators cannot accurately estimate utility. In the absence of utility estimators, leading quality estimators are assessed as both quality and utility estimators and dismantled to understand those image characteristics that distinguish utility from quality. A newly proposed utility estimator demonstrates that a measure of contour degradation is sufficient to accurately estimate utility and is argued to be compatible with shape-based theories of object perception.

12.
IEEE Trans Image Process ; 30: 517-531, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33201815

RESUMO

Virtual viewpoints synthesis is an essential process for many immersive applications including Free-viewpoint TV (FTV). A widely used technique for viewpoints synthesis is Depth-Image-Based-Rendering (DIBR) technique. However, such technique may introduce challenging non-uniform spatial-temporal structure-related distortions. Most of the existing state-of-the-art quality metrics fail to handle these distortions, especially the temporal structure inconsistencies observed during the switch of different viewpoints. To tackle this problem, an elastic metric and multi-scale trajectory based video quality metric (EM-VQM) is proposed in this paper. Dense motion trajectory is first used as a proxy for selecting temporal sensitive regions, where local geometric distortions might significantly diminish the perceived quality. Afterwards, the amount of temporal structure inconsistencies and unsmooth viewpoints transitions are quantified by calculating 1) the amount of motion trajectory deformations with elastic metric and, 2) the spatial-temporal structural dissimilarity. According to the comprehensive experimental results on two FTV video datasets, the proposed metric outperforms the state-of-the-art metrics designed for free-viewpoint videos significantly and achieves a gain of 12.86% and 16.75% in terms of median Pearson linear correlation coefficient values on the two datasets compared to the best one, respectively.

13.
IEEE Trans Vis Comput Graph ; 27(3): 2202-2219, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33166254

RESUMO

Surface meshes associated with diffuse texture or color attributes are becoming popular multimedia contents. They provide a high degree of realism and allow six degrees of freedom (6DoF) interactions in immersive virtual reality environments. Just like other types of multimedia, 3D meshes are subject to a wide range of processing, e.g., simplification and compression, which result in a loss of quality of the final rendered scene. Thus, both subjective studies and objective metrics are needed to understand and predict this visual loss. In this work, we introduce a large dataset of 480 animated meshes with diffuse color information, and associated with perceived quality judgments. The stimuli were generated from 5 source models subjected to geometry and color distortions. Each stimulus was associated with 6 hypothetical rendering trajectories (HRTs): combinations of 3 viewpoints and 2 animations. A total of 11520 quality judgments (24 per stimulus) were acquired in a subjective experiment conducted in virtual reality. The results allowed us to explore the influence of source models, animations and viewpoints on both the quality scores and their confidence intervals. Based on these findings, we propose the first metric for quality assessment of 3D meshes with diffuse colors, which works entirely on the mesh domain. This metric incorporates perceptually-relevant curvature-based and color-based features. We evaluate its performance, as well as a number of Image Quality Metrics (IQMs), on two datasets: ours and a dataset of distorted textured meshes. Our metric demonstrates good results and a better stability than IQMs. Finally, we investigated how the knowledge of the viewpoint (i.e., the visible parts of the 3D model) may improve the results of objective metrics.

14.
IEEE Trans Image Process ; 30: 4622-4636, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33900914

RESUMO

Ultra-high definition (UHD) 360 videos encoded in fine quality are typically too large to stream in its entirety over bandwidth (BW)-constrained networks. One popular approach is to interactively extract and send a spatial sub-region corresponding to a viewer's current field-of-view (FoV) in a head-mounted display (HMD) for more BW-efficient streaming. Due to the non-negligible round-trip-time (RTT) delay between server and client, accurate head movement prediction foretelling a viewer's future FoVs is essential. In this paper, we cast the head movement prediction task as a sparse directed graph learning problem: three sources of relevant information-collected viewers' head movement traces, a 360 image saliency map, and a biological human head model-are distilled into a view transition Markov model. Specifically, we formulate a constrained maximum a posteriori (MAP) problem with likelihood and prior terms defined using the three information sources. We solve the MAP problem alternately using a hybrid iterative reweighted least square (IRLS) and Frank-Wolfe (FW) optimization strategy. In each FW iteration, a linear program (LP) is solved, whose runtime is reduced thanks to warm start initialization. Having estimated a Markov model from data, we employ it to optimize a tile-based 360 video streaming system. Extensive experiments show that our head movement prediction scheme noticeably outperformed existing proposals, and our optimized tile-based streaming scheme outperformed competitors in rate-distortion performance.


Assuntos
Movimentos da Cabeça/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Gravação em Vídeo/métodos , Algoritmos , Aprendizado Profundo , Humanos , Cadeias de Markov , Modelos Estatísticos
15.
IEEE Trans Image Process ; 30: 1973-1988, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33444138

RESUMO

Saliency detection is an effective front-end process to many security-related tasks, e.g. automatic drive and tracking. Adversarial attack serves as an efficient surrogate to evaluate the robustness of deep saliency models before they are deployed in real world. However, most of current adversarial attacks exploit the gradients spanning the entire image space to craft adversarial examples, ignoring the fact that natural images are high-dimensional and spatially over-redundant, thus causing expensive attack cost and poor perceptibility. To circumvent these issues, this paper builds an efficient bridge between the accessible partially-white-box source models and the unknown black-box target models. The proposed method includes two steps: 1) We design a new partially-white-box attack, which defines the cost function in the compact hidden space to punish a fraction of feature activations corresponding to the salient regions, instead of punishing every pixel spanning the entire dense output space. This partially-white-box attack reduces the redundancy of the adversarial perturbation. 2) We exploit the non-redundant perturbations from some source models as the prior cues, and use an iterative zeroth-order optimizer to compute the directional derivatives along the non-redundant prior directions, in order to estimate the actual gradient of the black-box target model. The non-redundant priors boost the update of some "critical" pixels locating at non-zero coordinates of the prior cues, while keeping other redundant pixels locating at the zero coordinates unaffected. Our method achieves the best tradeoff between attack ability and perturbation redundancy. Finally, we conduct a comprehensive experiment to test the robustness of 18 state-of-the-art deep saliency models against 16 malicious attacks, under both of white-box and black-box settings, which contributes a new robustness benchmark to the saliency community for the first time.

16.
Artigo em Inglês | MEDLINE | ID: mdl-31976897

RESUMO

Owning to the recorded light ray distributions, light field contains much richer information and provides possibilities of some enlightening applications, and it has becoming more and more popular. To facilitate the relevant applications, many light field processing techniques have been proposed recently. These operations also bring the loss of visual quality, and thus there is need of a light field quality metric to quantify the visual quality loss. To reduce the processing complexity and resource consumption, light fields are generally sparsely sampled, compressed, and finally reconstructed and displayed to the users. We consider the distortions introduced in this typical light field processing chain, and propose a full-reference light field quality metric. Specifically, we measure the light field quality from three aspects: global spatial quality based on view structure matching, local spatial quality based on near-edge mean square error, and angular quality based on multi-view quality analysis. These three aspects have captured the most common distortions introduced in light field processing, including global distortions like blur and blocking, local geometric distortions like ghosting and stretching, and angular distortions like flickering and sampling. Experimental results show that the proposed method can estimate light field quality accurately, and it outperforms the state-of-the-art quality metrics which may be effective for light field.

17.
IEEE Trans Image Process ; 28(11): 5524-5536, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31180890

RESUMO

Free-viewpoint video, as the development direction of the next-generation video technologies, uses the depth-image-based rendering (DIBR) technique for the synthesis of video sequences at viewpoints, where real captured videos are missing. As reference videos at multiple viewpoints are not available, a blind reliable real-time quality metric of the synthesized video is needed. Although no-reference quality metrics dedicated to synthesized views successfully evaluate synthesized images, they are not that effective when evaluating synthesized video due to additional temporal flicker distortion typical only for video. In this paper, a new fast no-reference quality metric of synthesized video with synthesis distortions is proposed. It is guided by the fact that the DIBR-synthesized images are characterized by increased high frequency content. The metric is designed under the assumption that the perceived quality of DIBR-synthesized video can be estimated by quantifying the selected areas in the high-high wavelet subband. The threshold is used to select the most important distortion sensitive regions. The proposed No-Reference Morphological Wavelet with Threshold (NR_MWT) metric is computationally extremely efficient, comparable to PSNR, as the morphological wavelet transformation uses very short filters and only integer arithmetic. It is completely blind, without using machine learning techniques. Tested on the publicly available dataset of synthesized video sequences characterized by synthesis distortions, the metric achieves better performances and higher computational efficiency than the state-of-the-art metrics dedicated to DIBR-synthesized images and videos.

18.
IEEE Trans Image Process ; 28(11): 5336-5351, 2019 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31021766

RESUMO

Sonar imagery plays a significant role in oceanic applications since there is little natural light underwater, and light is irrelevant to sonar imaging. Sonar images are very likely to be affected by various distortions during the process of transmission via the underwater acoustic channel for further analysis. At the receiving end, the reference image is unavailable due to the complex and changing underwater environment and our unfamiliarity with it. To the best of our knowledge, one of the important usages of sonar images is target recognition on the basis of contour information. The contour degradation degree for a sonar image is relevant to the distortions contained in it. To this end, we developed a new no-reference contour degradation measurement for perceiving the quality of sonar images. The sparsities of a series of transform coefficient matrices, which are descriptive of contour information, are first extracted as features from the frequency and spatial domains. The contour degradation degree for a sonar image is then measured by calculating the ratios of extracted features before and after filtering this sonar image. Finally, a bootstrap aggregating (bagging)-based support vector regression module is learned to capture the relationship between the contour degradation degree and the sonar image quality. The results of experiments validate that the proposed metric is competitive with the state-of-the-art reference-based quality metrics and outperforms the latest reference-free competitors.

19.
Artigo em Inglês | MEDLINE | ID: mdl-31613763

RESUMO

Data size is the bottleneck for developing deep saliency models, because collecting eye-movement data is very time-consuming and expensive. Most of current studies on human attention and saliency modeling have used high-quality stereotype stimuli. In real world, however, captured images undergo various types of transformations. Can we use these transformations to augment existing saliency datasets? Here, we first create a novel saliency dataset including fixations of 10 observers over 1900 images degraded by 19 types of transformations. Second, by analyzing eye movements, we find that observers look at different locations over transformed versus original images. Third, we utilize the new data over transformed images, called data augmentation transformation (DAT), to train deep saliency models. We find that label-preserving DATs with negligible impact on human gaze boost saliency prediction, whereas some other DATs that severely impact human gaze degrade the performance. These label-preserving valid augmentation transformations provide a solution to enlarge existing saliency datasets. Finally, we introduce a novel saliency model based on generative adversarial networks (dubbed GazeGAN). A modified U-Net is utilized as the generator of the GazeGAN, which combines classic "skip connection" with a novel "center-surround connection" (CSC) module. Our proposed CSC module mitigates trivial artifacts while emphasizing semantic salient regions, and increases model nonlinearity, thus demonstrating better robustness against transformations. Extensive experiments and comparisons indicate that GazeGAN achieves state-of-the-art performance over multiple datasets. We also provide a comprehensive comparison of 22 saliency models on various transformed scenes, which contributes a new robustness benchmark to saliency community. Our code and dataset are available at.

20.
Vision Res ; 47(19): 2483-98, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17688904

RESUMO

To what extent can a computational model of the bottom-up visual attention predict what an observer is looking at? What is the contribution of the low-level visual features in the attention deployment? To answer these questions, a new spatio-temporal computational model is proposed. This model incorporates several visual features; therefore, a fusion algorithm is required to combine the different saliency maps (achromatic, chromatic and temporal). To quantitatively assess the model performances, eye movements were recorded while naive observers viewed natural dynamic scenes. Four completing metrics have been used. In addition, predictions from the proposed model are compared to the predictions from a state of the art model [Itti's model (Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254-1259)] and from three non-biologically plausible models (uniform, flicker and centered models). Regardless of the metric used, the proposed model shows significant improvement over the selected benchmarking models (except the centered model). Conclusions are drawn regarding both the influence of low-level visual features over time and the central bias in an eye tracking experiment.


Assuntos
Fixação Ocular/fisiologia , Modelos Psicológicos , Percepção Espacial/fisiologia , Atenção/fisiologia , Percepção de Cores/fisiologia , Humanos , Reconhecimento Visual de Modelos/fisiologia , Estimulação Luminosa/métodos , Desempenho Psicomotor/fisiologia , Psicofísica , Movimentos Sacádicos/fisiologia , Gravação em Vídeo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa