Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Sci Robot ; 9(90): eadj8124, 2024 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-38809998

RESUMO

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call artificial microsaccade-enhanced event camera (AMI-EV). Benchmark comparisons validated the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrated the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.


Assuntos
Algoritmos , Desenho de Equipamento , Robótica , Movimentos Sacádicos , Percepção Visual , Robótica/instrumentação , Humanos , Movimentos Sacádicos/fisiologia , Percepção Visual/fisiologia , Movimento (Física) , Software , Tempo de Reação/fisiologia , Biomimética/instrumentação , Fixação Ocular/fisiologia , Movimentos Oculares/fisiologia , Visão Ocular/fisiologia
2.
Sci Robot ; 8(81): eadd5139, 2023 Aug 16.
Artigo em Inglês | MEDLINE | ID: mdl-37585545

RESUMO

Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications.

3.
Med Phys ; 50(7): 4255-4268, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36630691

RESUMO

PURPOSE: Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly-annotated - produced for use by humans rather than machines and lacking information machine learning depends upon - this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. METHODS: Our pseudo-labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high-quality expert-produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross-checking the two types of annotations against each other, we obtain higher-fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer-aided detection (CADe) system for nodule detection in chest CT. RESULTS: We evaluated the proposed approach by presenting the network with different numbers of expert-annotated scans in training and then testing the CADe using an independent expert-annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly-labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false-positive rates. CONCLUSIONS: Our proposed approach can effectively merge a weakly-annotated dataset with a small, well-annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.


Assuntos
Algoritmos , Tomografia Computadorizada por Raios X , Humanos , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Processamento de Imagem Assistida por Computador/métodos
4.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6703-6714, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33507864

RESUMO

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2 place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

6.
Front Robot AI ; 9: 898075, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35783023

RESUMO

Tactile sensing for robotics is achieved through a variety of mechanisms, including magnetic, optical-tactile, and conductive fluid. Currently, the fluid-based sensors have struck the right balance of anthropomorphic sizes and shapes and accuracy of tactile response measurement. However, this design is plagued by a low Signal to Noise Ratio (SNR) due to the fluid based sensing mechanism "damping" the measurement values that are hard to model. To this end, we present a spatio-temporal gradient representation on the data obtained from fluid-based tactile sensors, which is inspired from neuromorphic principles of event based sensing. We present a novel algorithm (GradTac) that converts discrete data points from spatial tactile sensors into spatio-temporal surfaces and tracks tactile contours across these surfaces. Processing the tactile data using the proposed spatio-temporal domain is robust, makes it less susceptible to the inherent noise from the fluid based sensors, and allows accurate tracking of regions of touch as compared to using the raw data. We successfully evaluate and demonstrate the efficacy of GradTac on many real-world experiments performed using the Shadow Dexterous Hand, equipped with the BioTac SP sensors. Specifically, we use it for tracking tactile input across the sensor's surface, measuring relative forces, detecting linear and rotational slip, and for edge tracking. We also release an accompanying task-agnostic dataset for the BioTac SP, which we hope will provide a resource to compare and quantify various novel approaches, and motivate further research.

7.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 1056-1069, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31514126

RESUMO

In this paper, we introduce a non-rigid registration pipeline for pairs of unorganized point clouds that may be topologically different. Standard warp field estimation algorithms, even under robust, discontinuity-preserving regularization, tend to produce erratic motion estimates on boundaries associated with 'close-to-open' topology changes. We overcome this limitation by exploiting backward motion: in the opposite motion direction, a 'close-to-open' event becomes 'open-to-close', which is by default handled correctly. At the core of our approach lies a general, topology-agnostic warp field estimation algorithm, similar to those employed in recently introduced dynamic reconstruction systems from RGB-D input. We improve motion estimation on boundaries associated with topology changes in an efficient post-processing phase. Based on both forward and (inverted) backward warp hypotheses, we explicitly detect regions of the deformed geometry that undergo topological changes by means of local deformation criteria and broadly classify them as 'contacts' or 'separations'. Subsequently, the two motion hypotheses are seamlessly blended on a local basis, according to the type and proximity of detected events. Our method achieves state-of-the-art motion estimation accuracy on the MPI Sintel dataset. Experiments on a custom dataset with topological event annotations demonstrate the effectiveness of our pipeline in estimating motion on event boundaries, as well as promising performance in explicit topological event detection.

8.
Sci Robot ; 5(44)2020 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-33022608

RESUMO

An insect-scale visual sensing system indicates the return of active vision for robotics.

9.
Front Robot AI ; 7: 63, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33501231

RESUMO

It has been proposed that machine learning techniques can benefit from symbolic representations and reasoning systems. We describe a method in which the two can be combined in a natural and direct way by use of hyperdimensional vectors and hyperdimensional computing. By using hashing neural networks to produce binary vector representations of images, we show how hyperdimensional vectors can be constructed such that vector-symbolic inference arises naturally out of their output. We design the Hyperdimensional Inference Layer (HIL) to facilitate this process and evaluate its performance compared to baseline hashing networks. In addition to this, we show that separate network outputs can directly be fused at the vector symbolic level within HILs to improve performance and robustness of the overall model. Furthermore, to the best of our knowledge, this is the first instance in which meaningful hyperdimensional representations of images are created on real data, while still maintaining hyperdimensionality.

10.
Front Neurosci ; 10: 49, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26941595

RESUMO

Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

11.
Vision Res ; 50(3): 315-29, 2010 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-19969011

RESUMO

A new class of patterns, composed of repeating patches of asymmetric intensity profile, elicit strong perception of illusory motion. We propose that the main cause of this illusion is erroneous estimation of image motion induced by fixational eye movements. Image motion is estimated with spatial and temporal energy filters, which are symmetric in space, but asymmetric (causal) in time. That is, only the past, but not the future, is used to estimate the temporal energy. It is shown that such filters mis-estimate the motion of locally asymmetric intensity signals at certain spatial frequencies. In an experiment the perception of the different illusory signals was quantitatively compared by nulling the illusory motion with opposing real motion, and was found to be predicted well by the model.


Assuntos
Fixação Ocular/fisiologia , Percepção de Movimento/fisiologia , Ilusões Ópticas , Algoritmos , Movimentos Oculares/fisiologia , Humanos , Modelos Teóricos , Ilusões Ópticas/fisiologia
12.
IEEE Trans Pattern Anal Mach Intell ; 31(4): 649-60, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19229081

RESUMO

We present an analysis and algorithm for the problem of super-resolution imaging, that is the reconstruction of HR (high-resolution) images from a sequence of LR (low-resolution) images. Super-resolution reconstruction entails solutions to two problems. One is the alignment of image frames. The other is the reconstruction of a HR image from multiple aligned LR images. Both are important for the performance of super-resolution imaging. Image alignment is addressed with a new batch algorithm, which simultaneously estimates the homographies between multiple image frames by enforcing the surface normal vectors to be the same. This approach can handle longer video sequences quite well. Reconstruction is addressed with a wavelet-based iterative reconstruction algorithm with an efficient denoising scheme. The technique is based on a new analysis of video formation. At a high level our method could be described as a better-conditioned iterative back projection scheme with an efficient regularization criteria in each iteration step. Experiments with both simulated and real data demonstrate that our approach has better performance than existing super-resolution methods. It can remove even large amounts of mixed noise without creating artifacts.

13.
Biol Cybern ; 95(5): 487-501, 2006 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-16924530

RESUMO

In the neural superposition eye of a dipteran fly every ommatidium has eight photoreceptors, each associated with a rhabdomere, two central and six peripheral, which altogether result in seven functional light guides. Groups of eight rhabdomeres in neighboring ommatidia have largely overlapping fields of view. Based on the hypothesis that the light signals collected by these rhabdomeres can be used individually, we investigated the feasibility of estimating 3D scene information. According to Pick (Biol Cybern 26:215-224, 1977) the visual axes of these rhabdomeres are not parallel, but converge to a point 3-6 mm in front of the cornea. Such a structure theoretically could estimate depth in a very simple way by assuming that locally the image intensity is well approximated by a linear function of the spatial coordinates. Using the measurements of Pick (Biol Cybern 26:215-224, 1977) we performed simulation experiments to find whether this is practically possible. Our results indicate that depth estimation at small distances (up to about 1.5-2 cm) is reasonably accurate. This would allow the insect to obtain at least an ordinal spatial layout of its operational space when walking.


Assuntos
Percepção de Profundidade/fisiologia , Dípteros/fisiologia , Olho/citologia , Fenômenos Fisiológicos Oculares , Células Fotorreceptoras de Invertebrados/fisiologia , Animais , Dípteros/anatomia & histologia , Microscopia Eletrônica , Modelos Biológicos , Células Fotorreceptoras de Invertebrados/ultraestrutura
14.
Vision Res ; 46(19): 3105-20, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16750551

RESUMO

This paper discusses a problem, which is inherent in the estimation of 3D shape (surface normals) from multiple views. Noise in the image signal causes bias, which may result in substantial errors in the parameter estimation. The bias predicts the underestimation of slant found in psychophysical and computational experiments. Specifically, we analyze the estimation of 3D shape from motion and stereo using orientation disparity. For the case of stereo, we show that bias predicts the anisotropy in the perception of horizontal and vertical slant. For the case of 3D motion we demonstrate the bias by means of a new illusory display. Finally, we discuss statistically optimal strategies for the problem and suggest possible avenues for visual systems to deal with the bias.


Assuntos
Percepção de Forma/fisiologia , Modelos Psicológicos , Percepção de Movimento/fisiologia , Distorção da Percepção , Visão Binocular/fisiologia , Humanos , Orientação , Psicofísica , Disparidade Visual
15.
IEEE Trans Pattern Anal Mach Intell ; 28(6): 1018-23, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16724596

RESUMO

We propose to combine the information from multiple motion fields by enforcing a constraint on the surface normals (3D shape) of the scene in view. The fact that the shape vectors in the different views are related only by rotation can be formulated as a rank = 3 constraint. This constraint is implemented in an algorithm which solves 3D motion and structure estimation as a practical constrained minimization. Experiments demonstrate its usefulness as a tool in structure from motion providing very accurate estimates of 3D motion.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos
16.
IEEE Trans Pattern Anal Mach Intell ; 27(6): 988-92, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15943429

RESUMO

We examine the key role of occlusions in finding independently moving objects instantaneously in a video obtained by a moving camera with a restricted field of view. In this problem, the image motion is caused by the combined effect of camera motion (egomotion), structure (depth), and the independent motion of scene entities. For a camera with a restricted field of view undergoing a small motion between frames, there exists, in general, a set of 3D camera motions compatible with the observed flow field even if only a small amount of noise is present, leading to ambiguous 3D motion estimates. If separable sets of solutions exist, motion-based clustering can detect one category of moving objects. Even if a single inseparable set of solutions is found, we show that occlusion information can be used to find ordinal depth, which is critical in identifying a new class of moving objects. In order to find ordinal depth, occlusions must not only be known, but they must also be filled (grouped) with optical flow from neighboring regions. We present a novel algorithm for filling occlusions and deducing ordinal depth under general circumstances. Finally, we describe another category of moving objects which is detected using cardinal comparisons between structure from motion and structure estimates from another source (e.g., stereo).


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Movimento , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Gravação em Vídeo/métodos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
Vision Res ; 44(7): 727-49, 2004 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-14751556

RESUMO

It is proposed in this paper that many geometrical optical illusions, as well as illusory patterns due to motion signals in line drawings, are due to the statistics of visual computations. The interpretation of image patterns is preceded by a step where image features such as lines, intersections of lines, or local image movement must be derived. However, there are many sources of noise or uncertainty in the formation and processing of images, and they cause problems in the estimation of these features; in particular, they cause bias. As a result, the locations of features are perceived erroneously and the appearance of the patterns is altered. The bias occurs with any visual processing of line features; under average conditions it is not large enough to be noticeable, but illusory patterns are such that the bias is highly pronounced. Thus, the broader message of this paper is that there is a general uncertainty principle which governs the workings of vision systems, and optical illusions are an artifact of this principle.


Assuntos
Ilusões Ópticas , Visão Ocular/fisiologia , Humanos , Modelos Psicológicos , Percepção Visual/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA