Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 8 de 8
Filtrer
1.
Article de Anglais | MEDLINE | ID: mdl-29993911

RÉSUMÉ

Detecting camouflaged moving foreground objects has been known to be difficult due to the similarity between the foreground objects and the background. Conventional methods cannot distinguish the foreground from background due to the small differences between them and thus suffer from underdetection of the camouflaged foreground objects. In this paper, we present a fusion framework to address this problem in the wavelet domain. We first show that the small differences in the image domain can be highlighted in certain wavelet bands. Then the likelihood of each wavelet coefficient being foreground is estimated by formulating foreground and background models for each wavelet band. The proposed framework effectively aggregates the likelihoods from different wavelet bands based on the characteristics of the wavelet transform. Experimental results demonstrated that the proposed method significantly outperformed existing methods in detecting camouflaged foreground objects. Specifically, the average F-measure for the proposed algorithm was 0.87, compared to 0.71 to 0.8 for the other stateof- the-art methods.

2.
IEEE Trans Image Process ; 27(7): 3332-3344, 2018 Jul.
Article de Anglais | MEDLINE | ID: mdl-29641410

RÉSUMÉ

The advent of depth sensing technologies means that the extraction of object contours in images-a common and important pre-processing step for later higher level computer vision tasks like object detection and human action recognition-has become easier. However, captured depth images contain acquisition noise and the detected contours suffer from errors as a result. In this paper, we propose to jointly denoise and compress detected contours in an image for bandwidth-constrained transmission to a client, who can then carry out aforementioned application-specific tasks using the decoded contours as input. First, we prove theoretically that in general a joint denoising/compression approach can outperform a separate two-stage approach that first denoises then encodes contours lossily. Adopting a joint approach, we propose a burst error model that models typical errors encountered in an observed string of directional edges. We then formulate a rate-constrained maximum a posteriori problem that trades off the posterior probability of an estimated string given with its code rate. We design a dynamic programming algorithm that solves the posed problem optimally, and propose a compact context representation called total suffix tree that can reduce complexity of the algorithm dramatically. To the best of our knowledge, we are the first in the literature to study the problem of joint denoising/compression of image contours and offer a computation-efficient optimization algorithm. Experimental results show that our joint denoising/compression scheme can reduce bitrate by up to 18% compared with a competing separate scheme at comparable visual quality.

3.
Front Neurosci ; 12: 21, 2018.
Article de Anglais | MEDLINE | ID: mdl-29456486

RÉSUMÉ

Auditory spatial localization in humans is performed using a combination of interaural time differences, interaural level differences, as well as spectral cues provided by the geometry of the ear. To render spatialized sounds within a virtual reality (VR) headset, either individualized or generic Head Related Transfer Functions (HRTFs) are usually employed. The former require arduous calibrations, but enable accurate auditory source localization, which may lead to a heightened sense of presence within VR. The latter obviate the need for individualized calibrations, but result in less accurate auditory source localization. Previous research on auditory source localization in the real world suggests that our representation of acoustic space is highly plastic. In light of these findings, we investigated whether auditory source localization could be improved for users of generic HRTFs via cross-modal learning. The results show that pairing a dynamic auditory stimulus, with a spatio-temporally aligned visual counterpart, enabled users of generic HRTFs to improve subsequent auditory source localization. Exposure to the auditory stimulus alone or to asynchronous audiovisual stimuli did not improve auditory source localization. These findings have important implications for human perception as well as the development of VR systems as they indicate that generic HRTFs may be enough to enable good auditory source localization in VR.

4.
Sci Rep ; 7(1): 3817, 2017 06 19.
Article de Anglais | MEDLINE | ID: mdl-28630450

RÉSUMÉ

Humans are good at selectively listening to specific target conversations, even in the presence of multiple concurrent speakers. In our research, we study how auditory-visual cues modulate this selective listening. We do so by using immersive Virtual Reality technologies with spatialized audio. Exposing 32 participants to an Information Masking Task with concurrent speakers, we find significantly more errors in the decision-making processes triggered by asynchronous audiovisual speech cues. More precisely, the results show that lips on the Target speaker matched to a secondary (Mask) speaker's audio severely increase the participants' comprehension error rates. In a control experiment (n = 20), we further explore the influences of the visual modality over auditory selective attention. The results show a dominance of visual-speech cues, which effectively turn the Mask into the Target and vice-versa. These results reveal a disruption of selective attention that is triggered by bottom-up multisensory integration. The findings are framed in the sensory perception and cognitive neuroscience theories. The VR setup is validated by replicating previous results in this literature in a supplementary experiment.


Sujet(s)
Attention/physiologie , Prise de décision/physiologie , Perception de la parole/physiologie , Réalité de synthèse , Adulte , Femelle , Humains , Mâle , Adulte d'âge moyen
5.
IEEE Trans Image Process ; 26(2): 574-589, 2017 Feb.
Article de Anglais | MEDLINE | ID: mdl-27849536

RÉSUMÉ

Efficient encoding of object contours in images can facilitate advanced image/video compression techniques, such as shape-adaptive transform coding or motion prediction of arbitrarily shaped pixel blocks. We study the problem of lossless and lossy compression of detected contours in images. Specifically, we first convert a detected object contour into a sequence of directional symbols drawn from a small alphabet. To encode the symbol sequence using arithmetic coding, we compute an optimal variable-length context tree (VCT) T via a maximum a posterior (MAP) formulation to estimate symbols' conditional probabilities. MAP can avoid overfitting given a small training set X of past symbol sequences by identifying a VCT T with high likelihood P(X|T) of observing X given T , using a geometric prior P(T) stating that image contours are more often straight than curvy. For the lossy case, we design fast dynamic programming (DP) algorithms that optimally trade off coding rate of an approximate contour [Formula: see text] given a VCT T with two notions of distortion of [Formula: see text] with respect to the original contour x. To reduce the size of the DP tables, a total suffix tree is derived from a given VCT T for compact table entry indexing, reducing complexity. Experimental results show that for lossless contour coding, our proposed algorithm outperforms state-of-the-art context-based schemes consistently for both small and large training datasets. For lossy contour coding, our algorithms outperform comparable schemes in the literature in rate-distortion performance.

6.
IEEE Trans Image Process ; 25(6): 2896-2909, 2016 Jun.
Article de Anglais | MEDLINE | ID: mdl-27093627

RÉSUMÉ

When images at low bit-depth are rendered at high bit-depth displays, missing least significant bits needs to be estimated. We study the image bit-depth enhancement problem: estimating an original image from its quantized version from a minimum mean squared error (MMSE) perspective. We first argue that a graph-signal smoothness prior-one defined on a graph embedding the image structure-is an appropriate prior for the bit-depth enhancement problem. We next show that directly solving for the MMSE solution is, in general, too computationally expensive to be practical. We then propose an efficient approximation strategy. In particular, we first estimate the ac component of the desired signal in a maximum a posteriori formulation, efficiently computed via convex programming. We then compute the dc component with an MMSE criterion in a closed form given the computed ac component. Experiments show that our proposed two-step approach has improved performance over the conventional bit-depth enhancement schemes in both objective and subjective comparisons.

7.
IEEE Trans Image Process ; 23(11): 4696-708, 2014 Nov.
Article de Anglais | MEDLINE | ID: mdl-25181457

RÉSUMÉ

Depth image compression is important for compact representation of 3D visual data in texture-plus-depth format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-image-based rendering using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, and so on. In this paper, we leverage on the observation that neighboring pixels of similar depth have similar motion to efficiently encode depth video. Specifically, we divide a depth block containing two zones of distinct values (e.g., foreground and background) into two arbitrarily shaped regions (sub-blocks) along the dividing boundary before performing separate motion prediction (MP). While such arbitrarily shaped sub-block MP can lead to very small prediction residuals (resulting in few bits required for residual coding), it incurs an overhead to transmit the dividing boundaries for sub-block identification at decoder. To minimize this overhead, we first devise a scheme called arithmetic edge coding (AEC) to efficiently code boundaries that divide blocks into sub-blocks. Specifically, we propose to incorporate the boundary geometrical correlation in an adaptive arithmetic coder in the form of a statistical model. Then, we propose two optimization procedures to further improve the edge coding performance of AEC for a given depth image. The first procedure operates within a code block, and allows lossy compression of the detected block boundary to lower the cost of AEC, with an option to augment boundary depth pixel values matching the new boundary, given the augmented pixels do not adversely affect synthesized view distortion. The second procedure operates across code blocks, and systematically identifies blocks along an object contour that should be coded using sub-block MP via a rate-distortion optimized trellis. Experimental results show an average overall bitrate reduction of up to 33% over classical H.264/AVC.


Sujet(s)
Compression de données/méthodes , Interprétation d'images assistée par ordinateur/méthodes , Imagerie tridimensionnelle/méthodes , Reconnaissance automatique des formes/méthodes , Photographie (méthode)/méthodes , Enregistrement sur magnétoscope/méthodes , Algorithmes , Amélioration d'image/méthodes , Déplacement , Reproductibilité des résultats , Sensibilité et spécificité , Traitement du signal assisté par ordinateur , Technique de soustraction
8.
IEEE Trans Image Process ; 23(7): 3138-51, 2014 Jul.
Article de Anglais | MEDLINE | ID: mdl-24876124

RÉSUMÉ

Transmitting compactly represented geometry of a dynamic 3D scene from a sender can enable a multitude of imaging functionalities at a receiver, such as synthesis of virtual images at freely chosen viewpoints via depth-image-based rendering. While depth maps­projections of 3D geometry onto 2D image planes at chosen camera viewpoints-can nowadays be readily captured by inexpensive depth sensors, they are often corrupted by non-negligible acquisition noise. Given depth maps need to be denoised and compressed at the encoder for efficient network transmission to the decoder, in this paper, we consider the denoising and compression problems jointly, arguing that doing so will result in a better overall performance than the alternative of solving the two problems separately in two stages. Specifically, we formulate a rate-constrained estimation problem, where given a set of observed noise-corrupted depth maps, the most probable (maximum a posteriori (MAP)) 3D surface is sought within a search space of surfaces with representation size no larger than a prespecified rate constraint. Our rate-constrained MAP solution reduces to the conventional unconstrained MAP 3D surface reconstruction solution if the rate constraint is loose. To solve our posed rate-constrained estimation problem, we propose an iterative algorithm, where in each iteration the structure (object boundaries) and the texture (surfaces within the object boundaries) of the depth maps are optimized alternately. Using the MVC codec for compression of multiview depth video and MPEG free viewpoint video sequences as input, experimental results show that rate-constrained estimated 3D surfaces computed by our algorithm can reduce coding rate of depth maps by up to 32% compared with unconstrained estimated surfaces for the same quality of synthesized virtual views at the decoder.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE