RESUMEN
With the increasing social demands of disaster response, methods of visual observation for rescue and safety have become increasingly important. However, because of the shortage of datasets for disaster scenarios, there has been little progress in computer vision and robotics in this field. With this in mind, we present the first large-scale synthetic dataset of egocentric viewpoints for disaster scenarios. We simulate pre- and post-disaster cases with drastic changes in appearance, such as buildings on fire and earthquakes. The dataset consists of more than 300K high-resolution stereo image pairs, all annotated with ground-truth data for the semantic label, depth in metric scale, optical flow with sub-pixel precision, and surface normal as well as their corresponding camera poses. To create realistic disaster scenes, we manually augment the effects with 3D models using physically-based graphics tools. We train various state-of-the-art methods to perform computer vision tasks using our dataset, evaluate how well these methods recognize the disaster situations, and produce reliable results of virtual scenes as well as real-world images. We also present a convolutional neural network-based egocentric localization method that is robust to drastic appearance changes, such as the texture changes in a fire, and layout changes from a collapse. To address these key challenges, we propose a new model that learns a shape-based representation by training on stylized images, and incorporate the dominant planes of query images as approximate scene coordinates. We evaluate the proposed method using various scenes including a simulated disaster dataset to demonstrate the effectiveness of our method when confronted with significant changes in scene layout. Experimental results show that our method provides reliable camera pose predictions despite vastly changed conditions.
RESUMEN
We propose a new linear RGB-D simultaneous localization and mapping (SLAM) formulation by utilizing planar features of the structured environments. The key idea is to understand a given structured scene and exploit its structural regularities such as the Manhattan world. This understanding allows us to decouple the camera rotation by tracking structural regularities, which makes SLAM problems free from being highly nonlinear. Additionally, it provides a simple yet effective cue for representing planar features, which leads to a linear SLAM formulation. Given an accurate camera rotation, we jointly estimate the camera translation and planar landmarks in the global planar map using a linear Kalman filter. Our linear SLAM method, called L-SLAM, can understand not only the Manhattan world but the more general scenario of the Atlanta world, which consists of a vertical direction and a set of horizontal directions orthogonal to the vertical direction. To this end, we introduce a novel tracking-by-detection scheme that infers the underlying scene structure by Atlanta representation. With efficient Atlanta representation, we formulate a unified linear SLAM framework for structured environments. We evaluate L-SLAM on a synthetic dataset and RGB-D benchmarks, demonstrating comparable performance to other state-of-the-art SLAM methods without using expensive nonlinear optimization. We assess the accuracy of L-SLAM on a practical application of augmented reality.
RESUMEN
BACKGROUND: Recent studies have shown that brain-machine interfaces (BMIs) offer great potential for restoring upper limb function. However, grasping objects is a complicated task and the signals extracted from the brain may not always be capable of driving these movements reliably. Vision-guided robotic assistance is one possible way to improve BMI performance. We describe a method of shared control where the user controls a prosthetic arm using a BMI and receives assistance with positioning the hand when it approaches an object. METHODS: Two human subjects with tetraplegia used a robotic arm to complete object transport tasks with and without shared control. The shared control system was designed to provide a balance between BMI-derived intention and computer assistance. An autonomous robotic grasping system identified and tracked objects and defined stable grasp positions for these objects. The system identified when the user intended to interact with an object based on the BMI-controlled movements of the robotic arm. Using shared control, BMI controlled movements and autonomous grasping commands were blended to ensure secure grasps. RESULTS: Both subjects were more successful on object transfer tasks when using shared control compared to BMI control alone. Movements made using shared control were more accurate, more efficient, and less difficult. One participant attempted a task with multiple objects and successfully lifted one of two closely spaced objects in 92 % of trials, demonstrating the potential for users to accurately execute their intention while using shared control. CONCLUSIONS: Integration of BMI control with vision-guided robotic assistance led to improved performance on object transfer tasks. Providing assistance while maintaining generalizability will make BMI systems more attractive to potential users. TRIAL REGISTRATION: NCT01364480 and NCT01894802 .
Asunto(s)
Interfaces Cerebro-Computador , Rehabilitación Neurológica/instrumentación , Cuadriplejía , Robótica/métodos , Extremidad Superior , Adulto , Encéfalo/fisiopatología , Femenino , Mano/fisiopatología , Fuerza de la Mano , Humanos , Masculino , Persona de Mediana Edad , Movimiento , Cuadriplejía/fisiopatología , Extremidad Superior/fisiopatologíaRESUMEN
We propose a data-driven approach to estimate the likelihood that an image segment corresponds to a scene object (its "objectness") by comparing it to a large collection of example object regions. We demonstrate that when the application domain is known, for example, in our case activity of daily living (ADL), we can capture the regularity of the domain specific objects using millions of exemplar object regions. Our approach to estimating the objectness of an image region proceeds in two steps: 1) finding the exemplar regions that are the most similar to the input image segment; 2) calculating the objectness of the image segment by combining segment properties, mutual consistency across the nearest exemplar regions, and the prior probability of each exemplar region. In previous work, parametric objectness models were built from a small number of manually annotated objects regions, instead, our data-driven approach uses 5 million object regions along with their metadata information. Results on multiple data sets demonstrates our data-driven approach compared to the existing model based techniques. We also show the application of our approach in improving the performance of object discovery algorithms.
RESUMEN
Health care providers typically rely on family caregivers (CG) of persons with dementia (PWD) to describe difficult behaviors manifested by their underlying disease. Although invaluable, such reports may be selective or biased during brief medical encounters. Our team explored the usability of a wearable camera system with 9 caregiving dyads (CGs: 3 males, 6 females, 67.00 ± 14.95 years; PWDs: 2 males, 7 females, 80.00 ± 3.81 years, MMSE 17.33 ± 8.86) who recorded 79 salient events over a combined total of 140 hours of data capture, from 3 to 7 days of wear per CG. Prior to using the system, CGs assessed its benefits to be worth the invasion of privacy; post-wear privacy concerns did not differ significantly. CGs rated the system easy to learn to use, although cumbersome and obtrusive. Few negative reactions by PWDs were reported or evident in resulting video. Our findings suggest that CGs can and will wear a camera system to reveal their daily caregiving challenges to health care providers.
Asunto(s)
Cuidadores , Demencia/terapia , Monitoreo Ambulatorio/instrumentación , Fotograbar/instrumentación , Grabación en Video/instrumentación , Tecnología Inalámbrica/instrumentación , Adulto , Anciano , Anciano de 80 o más Años , Diseño de Equipo , Análisis de Falla de Equipo , Femenino , Humanos , Masculino , Uso Significativo , Persona de Mediana Edad , Monitoreo Ambulatorio/métodos , Fotograbar/métodos , Consulta Remota/instrumentación , Consulta Remota/métodos , Grabación en Video/métodos , Adulto JovenRESUMEN
This paper presents a fast and efficient computational approach to higher order spectral graph matching. Exploiting the redundancy in a tensor representing the affinity between feature points, we approximate the affinity tensor with the linear combination of Kronecker products between bases and index tensors. The bases and index tensors are highly compressed representations of the approximated affinity tensor, requiring much smaller memory than in previous methods, which store the full affinity tensor. We compute the principal eigenvector of the approximated affinity tensor using the small bases and index tensors without explicitly storing the approximated tensor. To compensate for the loss of matching accuracy by the approximation, we also adopt and incorporate a marginalization scheme that maps a higher order tensor to matrix as well as a one-to-one mapping constraint into the eigenvector computation process. The experimental results show that the proposed method is faster and requires smaller memory than the existing methods with little or no loss of accuracy.
RESUMEN
We present a unified occlusion model for object instance detection under arbitrary viewpoint. Whereas previous approaches primarily modeled local coherency of occlusions or attempted to learn the structure of occlusions from data, we propose to explicitly model occlusions by reasoning about 3D interactions of objects. Our approach accurately represents occlusions under arbitrary viewpoint without requiring additional training data, which can often be difficult to obtain. We validate our model by incorporating occlusion reasoning with the state-of-the-art LINE2D and Gradient Network methods for object instance detection and demonstrate significant improvement in recognizing texture-less objects under severe occlusions.
RESUMEN
Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segmented images. This is largely due to image segmentation being an ill-defined problem-there is no unique ground-truth segmentation of an image against which the output of an algorithm may be compared. This paper demonstrates how a recently proposed measure of similarity, the Normalized Probabilistic Rand (NPR) index, can be used to perform a quantitative comparison between image segmentation algorithms using a hand-labeled set of ground-truth segmentations. We show that the measure allows principled comparisons between segmentations created by different algorithms, as well as segmentations on different images. We outline a procedure for algorithm evaluation through an example evaluation of some familiar algorithms-the mean-shift-based algorithm, an efficient graph-based segmentation algorithm, a hybrid algorithm that combines the strengths of both methods, and expectation maximization. Results are presented on the 300 images in the publicly available Berkeley Segmentation Data Set.
Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Simulación por Computador , Interpretación Estadística de Datos , Almacenamiento y Recuperación de la Información/métodos , Modelos Estadísticos , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
We propose a new method for rapid 3D object indexing that combines feature-based methods with coarse alignment-based matching techniques. Our approach achieves a sublinear complexity on the number of models, maintaining at the same time a high degree of performance for real 3D sensed data that is acquired in largely uncontrolled settings. The key component of our method is to first index surface descriptors computed at salient locations from the scene into the whole model database using the Locality Sensitive Hashing (LSH), a probabilistic approximate nearest neighbor method. Progressively complex geometric constraints are subsequently enforced to further prune the initial candidates and eliminate false correspondences due to inaccuracies in the surface descriptors and the errors of the LSH algorithm. The indexed models are selected based on the MAP rule using posterior probability of the models estimated in the joint 3D-signature space. Experiments with real 3D data employing a large database of vehicles, most of them very similar in shape, containing 1,000,000 features from more than 365 models demonstrate a high degree of performance in the presence of occlusion and obscuration, unmodeled vehicle interiors and part articulations, with an average processing time between 50 and 100 seconds per query.
Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Automóviles , Bases de Datos Factuales , Análisis Numérico Asistido por Computador , Sistemas en LíneaRESUMEN
We present an approach to the recognition of complex-shaped objects in cluttered environments based on edge information. We first use example images of a target object in typical environments to train a classifier cascade that determines whether edge pixels in an image belong to an instance of the desired object or the clutter. Presented with a novel image, we use the cascade to discard clutter edge pixels and group the object edge pixels into overall detections of the object. The features used for the edge pixel classification are localized, sparse edge density operations. Experiments validate the effectiveness of the technique for recognition of a set of complex objects in a variety of cluttered indoor scenes under arbitrary out-of-image-plane rotation. Furthermore, our experiments suggest that the technique is robust to variations between training and testing environments and is efficient at runtime.