RESUMO
People are able to keep track of objects as they navigate through space, even when objects are out of sight. This requires some kind of representation of the scene and of the observer's location but the form this might take is debated. We tested the accuracy and reliability of observers' estimates of the visual direction of previously-viewed targets. Participants viewed four objects from one location, with binocular vision and small head movements then, without any further sight of the targets, they walked to another location and pointed towards them. All conditions were tested in an immersive virtual environment and some were also carried out in a real scene. Participants made large, consistent pointing errors that are poorly explained by any stable 3D representation. Any explanation based on a 3D representation would have to posit a different layout of the remembered scene depending on the orientation of the obscuring wall at the moment the participant points. Our data show that the mechanisms for updating visual direction of unseen targets are not based on a stable 3D model of the scene, even a distorted one.
RESUMO
There is good evidence that simple animals, such as bees, use view-based strategies to return to a familiar location, whereas humans might use a 3-D reconstruction to achieve the same goal. Assuming some noise in the storage and retrieval process, these two types of strategy give rise to different patterns of predicted errors in homing. We describe an experiment that can help distinguish between these models. Participants wore a head-mounted display to carry out a homing task in immersive virtual reality. They viewed three long, thin, vertical poles and had to remember where they were in relation to the poles before being transported (virtually) to a new location in the scene from where they had to walk back to the original location. The experiment was conducted in both a rich-cue scene (a furnished room) and a sparse scene (no background and no floor or ceiling). As one would expect, in a rich-cue environment, the overall error was smaller, and in this case, the ability to separate the models was reduced. However, for the sparse-cue environment, the view-based model outperforms the reconstruction-based model. Specifically, the likelihood of the experimental data is similar to the likelihood of samples drawn from the view-based model (but assessed under both models), and this is not true for samples drawn from the reconstruction-based model.
Assuntos
Meio Ambiente , Modelos Teóricos , Percepção Visual/fisiologia , Adulto , Humanos , Funções Verossimilhança , Masculino , Adulto JovemRESUMO
Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the content of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
RESUMO
We introduce a machine learning approach to demosaicing, the reconstruction of color images from incomplete color filter array samples. There are two challenges to overcome by a demosaicing method: 1) it needs to model and respect the statistics of natural images in order to reconstruct natural looking images and 2) it should be able to perform well in the presence of noise. To facilitate an objective assessment of current methods, we introduce a public ground truth data set of natural images suitable for research in image demosaicing and denoising. We then use this large data set to develop a machine learning approach to demosaicing. Our proposed method addresses both demosaicing challenges by learning a statistical model of images and noise from hundreds of natural images. The resulting model performs simultaneous demosaicing and denoising. We show that the machine learning approach has a number of benefits: 1) the model is trained to directly optimize a user-specified performance measure such as peak signal-to-noise ratio (PSNR) or structural similarity; 2) we can handle novel color filter array layouts by retraining the model on such layouts; and 3) it outperforms the previous state-of-the-art, in some setups by 0.7-dB PSNR, faithfully reconstructing edges, textures, and smooth areas. Our results demonstrate that in demosaicing and related imaging applications, discriminatively trained machine learning models have the potential for peak performance at comparatively low engineering effort.
Assuntos
Processamento de Imagem Assistida por Computador/métodos , Estatísticas não Paramétricas , Análise de Regressão , Razão Sinal-RuídoRESUMO
It is often assumed that humans generate a 3D reconstruction of the environment, either in egocentric or world-based coordinates, but the steps involved are unknown. Here, we propose two reconstruction-based models, evaluated using data from two tasks in immersive virtual reality. We model the observer's prediction of landmark location based on standard photogrammetric methods and then combine location predictions to compute likelihood maps of navigation behaviour. In one model, each scene point is treated independently in the reconstruction; in the other, the pertinent variable is the spatial relationship between pairs of points. Participants viewed a simple environment from one location, were transported (virtually) to another part of the scene and were asked to navigate back. Error distributions varied substantially with changes in scene layout; we compared these directly with the likelihood maps to quantify the success of the models. We also measured error distributions when participants manipulated the location of a landmark to match the preceding interval, providing a direct test of the landmark-location stage of the navigation models. Models such as this, which start with scenes and end with a probabilistic prediction of behaviour, are likely to be increasingly useful for understanding 3D vision.
Assuntos
Percepção Visual , Humanos , Funções Verossimilhança , Modelos TeóricosRESUMO
3D morphable models are low-dimensional parameterizations of 3D object classes which provide a powerful means of associating 3D geometry to 2D images. However, morphable models are currently generated from 3D scans, so for general object classes such as animals they are economically and practically infeasible. We show that, given a small amount of user interaction (little more than that required to build a conventional morphable model), there is enough information in a collection of 2D pictures of certain object classes to generate a full 3D morphable model, even in the absence of surface texture. The key restriction is that the object class should not be strongly articulated, and that a very rough rigid model should be provided as an initial estimate of the "mean shape." The model representation is a linear combination of subdivision surfaces, which we fit to image silhouettes and any identifiable key points using a novel combined continuous-discrete optimization strategy. Results are demonstrated on several natural object classes, and show that models of rather high quality can be obtained from this limited information.
Assuntos
Golfinhos/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Modelos Anatômicos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Animais , Simulação por Computador , Aumento da Imagem/métodosRESUMO
Accurate calibration of a head mounted display (HMD) is essential both for research on the visual system and for realistic interaction with virtual objects. Yet, existing calibration methods are time consuming and depend on human judgements, making them error prone, and are often limited to optical see-through HMDs. Building on our existing approach to HMD calibration Gilson et al. (2008), we show here how it is possible to calibrate a non-see-through HMD. A camera is placed inside a HMD displaying an image of a regular grid, which is captured by the camera. The HMD is then removed and the camera, which remains fixed in position, is used to capture images of a tracked calibration object in multiple positions. The centroids of the markers on the calibration object are recovered and their locations re-expressed in relation to the HMD grid. This allows established camera calibration techniques to be used to recover estimates of the HMD display's intrinsic parameters (width, height, focal length) and extrinsic parameters (optic centre and orientation of the principal ray). We calibrated a HMD in this manner and report the magnitude of the errors between real image features and reprojected features. Our calibration method produces low reprojection errors without the need for error-prone human judgements.
Assuntos
Terminais de Computador/normas , Neurofisiologia/instrumentação , Fotogrametria/instrumentação , Interface Usuário-Computador , Gravação em Vídeo/instrumentação , Animais , Calibragem/normas , Humanos , Neurofisiologia/métodos , Óptica e Fotônica/instrumentação , Óptica e Fotônica/métodos , Fotogrametria/métodos , Fotogrametria/normas , Gravação em Vídeo/métodosRESUMO
We present here a method for calibrating an optical see-through head-mounted display (HMD) using techniques usually applied to camera calibration (photogrammetry). Using a camera placed inside the HMD to take pictures simultaneously of a tracked object and features in the HMD display, we could exploit established camera calibration techniques to recover both the intrinsic and extrinsic properties of the HMD (width, height, focal length, optic centre and principal ray of the display). Our method gives low re-projection errors and, unlike existing methods, involves no time-consuming and error-prone human measurements, nor any prior estimates about the HMD geometry.
Assuntos
Cabeça , Aumento da Imagem/instrumentação , Óptica e Fotônica/instrumentação , Percepção Espacial/fisiologia , Visão Ocular/fisiologia , Algoritmos , Calibragem , Gráficos por Computador , Análise de Falha de Equipamento , Dispositivos de Proteção da Cabeça , Humanos , Aumento da Imagem/normas , Fotogrametria/instrumentação , Fotogrametria/métodos , Sensibilidade e Especificidade , Interface Usuário-Computador , Gravação em Vídeo/métodosRESUMO
As we move through the world, our eyes acquire a sequence of images. The information from this sequence is sufficient to determine the structure of a three-dimensional scene, up to a scale factor determined by the distance that the eyes have moved. Previous evidence shows that the human visual system accounts for the distance the observer has walked and the separation of the eyes when judging the scale, shape, and distance of objects. However, in an immersive virtual-reality environment, observers failed to notice when a scene expanded or contracted, despite having consistent information about scale from both distance walked and binocular vision. This failure led to large errors in judging the size of objects. The pattern of errors cannot be explained by assuming a visual reconstruction of the scene with an incorrect estimate of interocular separation or distance walked. Instead, it is consistent with a Bayesian model of cue integration in which the efficacy of motion and disparity cues is greater at near viewing distances. Our results imply that observers are more willing to adjust their estimate of interocular separation or distance walked than to accept that the scene has changed in size.
Assuntos
Percepção Espacial , Simulação por Computador , Sinais (Psicologia) , Humanos , Percepção de Movimento , Ilusões Ópticas , Psicofísica , Interface Usuário-Computador , Visão BinocularRESUMO
An increasing number of neuroscience experiments are using virtual reality to provide a more immersive and less artificial experimental environment. This is particularly useful to navigation and three-dimensional scene perception experiments. Such experiments require accurate real-time tracking of the observer's head in order to render the virtual scene. Here, we present data on the accuracy of a commonly used six degrees of freedom tracker (Intersense IS900) when it is moved in ways typical of virtual reality applications. We compared the reported location of the tracker with its location computed by an optical tracking method. When the tracker was stationary, the root mean square error in spatial accuracy was 0.64 mm. However, we found that errors increased over ten-fold (up to 17 mm) when the tracker moved at speeds common in virtual reality applications. We demonstrate that the errors we report here are predominantly due to inaccuracies of the IS900 system rather than the optical tracking against which it was compared.