Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Sci Rep ; 9(1): 12578, 2019 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-31467296

RESUMO

People are able to keep track of objects as they navigate through space, even when objects are out of sight. This requires some kind of representation of the scene and of the observer's location but the form this might take is debated. We tested the accuracy and reliability of observers' estimates of the visual direction of previously-viewed targets. Participants viewed four objects from one location, with binocular vision and small head movements then, without any further sight of the targets, they walked to another location and pointed towards them. All conditions were tested in an immersive virtual environment and some were also carried out in a real scene. Participants made large, consistent pointing errors that are poorly explained by any stable 3D representation. Any explanation based on a 3D representation would have to posit a different layout of the remembered scene depending on the orientation of the obscuring wall at the moment the participant points. Our data show that the mechanisms for updating visual direction of unseen targets are not based on a stable 3D model of the scene, even a distorted one.

3.
J Vis ; 17(9): 11, 2017 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-28813567

RESUMO

There is good evidence that simple animals, such as bees, use view-based strategies to return to a familiar location, whereas humans might use a 3-D reconstruction to achieve the same goal. Assuming some noise in the storage and retrieval process, these two types of strategy give rise to different patterns of predicted errors in homing. We describe an experiment that can help distinguish between these models. Participants wore a head-mounted display to carry out a homing task in immersive virtual reality. They viewed three long, thin, vertical poles and had to remember where they were in relation to the poles before being transported (virtually) to a new location in the scene from where they had to walk back to the original location. The experiment was conducted in both a rich-cue scene (a furnished room) and a sparse scene (no background and no floor or ceiling). As one would expect, in a rich-cue environment, the overall error was smaller, and in this case, the ability to separate the models was reduced. However, for the sparse-cue environment, the view-based model outperforms the reconstruction-based model. Specifically, the likelihood of the experimental data is similar to the likelihood of samples drawn from the view-based model (but assessed under both models), and this is not true for samples drawn from the reconstruction-based model.


Assuntos
Meio Ambiente , Modelos Teóricos , Percepção Visual/fisiologia , Adulto , Humanos , Funções Verossimilhança , Masculino , Adulto Jovem
4.
IEEE Trans Pattern Anal Mach Intell ; 37(6): 1274-85, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26357348

RESUMO

Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.

5.
IEEE Trans Image Process ; 23(12): 4968-81, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25265607

RESUMO

We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: 1) it needs to model and respect the statistics of natural images in order to reconstruct natural looking images and 2) it should be able to perform well in the presence of noise. To facilitate an objective assessment of current methods, we introduce a public ground truth data set of natural images suitable for research in image demosaicing and denoising. We then use this large data set to develop a machine learning approach to demosaicing. Our proposed method addresses both demosaicing challenges by learning a statistical model of images and noise from hundreds of natural images. The resulting model performs simultaneous demosaicing and denoising. We show that the machine learning approach has a number of benefits: 1) the model is trained to directly optimize a user-specified performance measure such as peak signal-to-noise ratio (PSNR) or structural similarity; 2) we can handle novel color filter array layouts by retraining the model on such layouts; and 3) it outperforms the previous state-of-the-art, in some setups by 0.7-dB PSNR, faithfully reconstructing edges, textures, and smooth areas. Our results demonstrate that in demosaicing and related imaging applications, discriminatively trained machine learning models have the potential for peak performance at comparatively low engineering effort.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Estatísticas não Paramétricas , Análise de Regressão , Razão Sinal-Ruído
6.
IEEE Trans Vis Comput Graph ; 20(6): 839-51, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26357302

RESUMO

Kinectrack is a novel approach to six-DoF tracking that provides agile real-time pose estimation using only commodity hardware. The dot pattern emitter and IR camera components of the standard Kinect device are separated to allow the emitter to roam freely relative to a fixed camera. The six-DoF pose of the emitter component is recovered by matching the dense dot pattern observed by the camera to a pre-captured reference image. A novel matching technique is introduced to obtain the dense dot pattern correspondences efficiently in wide- and adaptive-baseline scenarios that requires only a small subset of the full dense dot pattern to fall within the field of view of the fixed camera. An auto-calibration process is proposed in order to obtain the intrinsic parameters of the fixed camera and the internal dot pattern reference image of the emitter. The system simultaneously recovers the six-DoF pose of the emitter device and the piecewise planar 3D scene structure. Kinectrack provides a low-cost method for tracking an object without any on-board computation, with small size and only simple electronics. This paper extends the original ISMAR 2012 submission, including a demonstration of robust pose tracking for AR and examples of matching in planar and non-planar scenes.

7.
IEEE Trans Pattern Anal Mach Intell ; 35(12): 2821-40, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24136424

RESUMO

We describe two new approaches to human pose estimation. Both can quickly and accurately predict the 3D positions of body joints from a single depth image without using any temporal information. The key to both approaches is the use of a large, realistic, and highly varied synthetic set of training images. This allows us to learn models that are largely invariant to factors such as pose, body shape, field-of-view cropping, and clothing. Our first approach employs an intermediate body parts representation, designed so that an accurate per-pixel classification of the parts will localize the joints of the body. The second approach instead directly regresses the positions of body joints. By using simple depth pixel comparison features and parallelizable decision forests, both approaches can run super-real time on consumer hardware. Our evaluation investigates many aspects of our methods, and compares the approaches to each other and to the state of the art. Results on silhouettes suggest broader applicability to other imaging modalities.


Assuntos
Algoritmos , Imageamento Tridimensional , Humanos , Interpretação de Imagem Assistida por Computador , Análise de Regressão
8.
Biol Cybern ; 107(4): 449-64, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23778937

RESUMO

It is often assumed that humans generate a 3D reconstruction of the environment, either in egocentric or world-based coordinates, but the steps involved are unknown. Here, we propose two reconstruction-based models, evaluated using data from two tasks in immersive virtual reality. We model the observer's prediction of landmark location based on standard photogrammetric methods and then combine location predictions to compute likelihood maps of navigation behaviour. In one model, each scene point is treated independently in the reconstruction; in the other, the pertinent variable is the spatial relationship between pairs of points. Participants viewed a simple environment from one location, were transported (virtually) to another part of the scene and were asked to navigate back. Error distributions varied substantially with changes in scene layout; we compared these directly with the likelihood maps to quantify the success of the models. We also measured error distributions when participants manipulated the location of a landmark to match the preceding interval, providing a direct test of the landmark-location stage of the navigation models. Models such as this, which start with scenes and end with a probabilistic prediction of behaviour, are likely to be increasingly useful for understanding 3D vision.


Assuntos
Percepção Visual , Humanos , Funções Verossimilhança , Modelos Teóricos
9.
IEEE Trans Pattern Anal Mach Intell ; 35(1): 232-44, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22392707

RESUMO

3D morphable models are low-dimensional parameterizations of 3D object classes which provide a powerful means of associating 3D geometry to 2D images. However, morphable models are currently generated from 3D scans, so for general object classes such as animals they are economically and practically infeasible. We show that, given a small amount of user interaction (little more than that required to build a conventional morphable model), there is enough information in a collection of 2D pictures of certain object classes to generate a full 3D morphable model, even in the absence of surface texture. The key restriction is that the object class should not be strongly articulated, and that a very rough rigid model should be provided as an initial estimate of the "mean shape." The model representation is a linear combination of subdivision surfaces, which we fit to image silhouettes and any identifiable key points using a novel combined continuous-discrete optimization strategy. Results are demonstrated on several natural object classes, and show that models of rather high quality can be obtained from this limited information.


Assuntos
Golfinhos/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos Anatômicos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Animais , Simulação por Computador , Aumento da Imagem/métodos
11.
J Neurosci Methods ; 199(2): 328-35, 2011 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-21620891

RESUMO

Accurate calibration of a head mounted display (HMD) is essential both for research on the visual system and for realistic interaction with virtual objects. Yet, existing calibration methods are time consuming and depend on human judgements, making them error prone, and are often limited to optical see-through HMDs. Building on our existing approach to HMD calibration Gilson et al. (2008), we show here how it is possible to calibrate a non-see-through HMD. A camera is placed inside a HMD displaying an image of a regular grid, which is captured by the camera. The HMD is then removed and the camera, which remains fixed in position, is used to capture images of a tracked calibration object in multiple positions. The centroids of the markers on the calibration object are recovered and their locations re-expressed in relation to the HMD grid. This allows established camera calibration techniques to be used to recover estimates of the HMD display's intrinsic parameters (width, height, focal length) and extrinsic parameters (optic centre and orientation of the principal ray). We calibrated a HMD in this manner and report the magnitude of the errors between real image features and reprojected features. Our calibration method produces low reprojection errors without the need for error-prone human judgements.


Assuntos
Terminais de Computador/normas , Neurofisiologia/instrumentação , Fotogrametria/instrumentação , Interface Usuário-Computador , Gravação em Vídeo/instrumentação , Animais , Calibragem/normas , Humanos , Neurofisiologia/métodos , Óptica e Fotônica/instrumentação , Óptica e Fotônica/métodos , Fotogrametria/métodos , Fotogrametria/normas , Gravação em Vídeo/métodos
12.
IEEE Trans Pattern Anal Mach Intell ; 31(12): 2115-28, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19834135

RESUMO

Second-order priors on the smoothness of 3D surfaces are a better model of typical scenes than first-order priors. However, stereo reconstruction using global inference algorithms, such as graph cuts, has not been able to incorporate second-order priors because the triple cliques needed to express them yield intractable (nonsubmodular) optimization problems. This paper shows that inference with triple cliques can be effectively performed. Our optimization strategy is a development of recent extensions to alpha -- expansion, based on the "QPBO" algorithm. The strategy is to repeatedly merge proposal depth maps using a novel extension of QPBO. Proposal depth maps can come from any source, for example, frontoparallel planes as in alpha-expansion, or indeed any existing stereo algorithm, with arbitrary parameter settings.

13.
J Neurosci Methods ; 173(1): 140-6, 2008 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-18599125

RESUMO

We present here a method for calibrating an optical see-through head-mounted display (HMD) using techniques usually applied to camera calibration (photogrammetry). Using a camera placed inside the HMD to take pictures simultaneously of a tracked object and features in the HMD display, we could exploit established camera calibration techniques to recover both the intrinsic and extrinsic properties of the HMD (width, height, focal length, optic centre and principal ray of the display). Our method gives low re-projection errors and, unlike existing methods, involves no time-consuming and error-prone human measurements, nor any prior estimates about the HMD geometry.


Assuntos
Cabeça , Aumento da Imagem/instrumentação , Óptica e Fotônica/instrumentação , Percepção Espacial/fisiologia , Visão Ocular/fisiologia , Algoritmos , Calibragem , Gráficos por Computador , Análise de Falha de Equipamento , Dispositivos de Proteção da Cabeça , Humanos , Aumento da Imagem/normas , Fotogrametria/instrumentação , Fotogrametria/métodos , Sensibilidade e Especificidade , Interface Usuário-Computador , Gravação em Vídeo/métodos
14.
J Neurosci Methods ; 154(1-2): 175-82, 2006 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-16448700

RESUMO

An increasing number of neuroscience experiments are using virtual reality to provide a more immersive and less artificial experimental environment. This is particularly useful to navigation and three-dimensional scene perception experiments. Such experiments require accurate real-time tracking of the observer's head in order to render the virtual scene. Here, we present data on the accuracy of a commonly used six degrees of freedom tracker (Intersense IS900) when it is moved in ways typical of virtual reality applications. We compared the reported location of the tracker with its location computed by an optical tracking method. When the tracker was stationary, the root mean square error in spatial accuracy was 0.64 mm. However, we found that errors increased over ten-fold (up to 17 mm) when the tracker moved at speeds common in virtual reality applications. We demonstrate that the errors we report here are predominantly due to inaccuracies of the IS900 system rather than the optical tracking against which it was compared.


Assuntos
Gráficos por Computador , Desempenho Psicomotor/fisiologia , Psicofísica/métodos , Aceleração , Estimulação Acústica , Sistemas Computacionais , Humanos , Software
15.
Curr Biol ; 16(4): 428-32, 2006 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-16488879

RESUMO

As we move through the world, our eyes acquire a sequence of images. The information from this sequence is sufficient to determine the structure of a three-dimensional scene, up to a scale factor determined by the distance that the eyes have moved. Previous evidence shows that the human visual system accounts for the distance the observer has walked and the separation of the eyes when judging the scale, shape, and distance of objects. However, in an immersive virtual-reality environment, observers failed to notice when a scene expanded or contracted, despite having consistent information about scale from both distance walked and binocular vision. This failure led to large errors in judging the size of objects. The pattern of errors cannot be explained by assuming a visual reconstruction of the scene with an incorrect estimate of interocular separation or distance walked. Instead, it is consistent with a Bayesian model of cue integration in which the efficacy of motion and disparity cues is greater at near viewing distances. Our results imply that observers are more willing to adjust their estimate of interocular separation or distance walked than to accept that the scene has changed in size.


Assuntos
Percepção Espacial , Simulação por Computador , Sinais (Psicologia) , Humanos , Percepção de Movimento , Ilusões Ópticas , Psicofísica , Interface Usuário-Computador , Visão Binocular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA