Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Front Robot AI ; 11: 1431826, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39360225

RESUMEN

The rapidly increasing capabilities of autonomous mobile robots promise to make them ubiquitous in the coming decade. These robots will continue to enhance efficiency and safety in novel applications such as disaster management, environmental monitoring, bridge inspection, and agricultural inspection. To operate autonomously without constant human intervention, even in remote or hazardous areas, robots must sense, process, and interpret environmental data using only onboard sensing and computation. This capability is made possible by advancements in perception algorithms, allowing these robots to rely primarily on their perception capabilities for navigation tasks. However, tiny robot autonomy is hindered mainly by sensors, memory, and computing due to size, area, weight, and power constraints. The bottleneck in these robots lies in the real-time perception in resource-constrained robots. To enable autonomy in robots of sizes that are less than 100 mm in body length, we draw inspiration from tiny organisms such as insects and hummingbirds, known for their sophisticated perception, navigation, and survival abilities despite their minimal sensor and neural system. This work aims to provide insights into designing a compact and efficient minimal perception framework for tiny autonomous robots from higher cognitive to lower sensor levels.

2.
J Med Imaging (Bellingham) ; 11(4): 044507, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-39119067

RESUMEN

Purpose: Synthetic datasets hold the potential to offer cost-effective alternatives to clinical data, ensuring privacy protections and potentially addressing biases in clinical data. We present a method leveraging such datasets to train a machine learning algorithm applied as part of a computer-aided detection (CADe) system. Approach: Our proposed approach utilizes clinically acquired computed tomography (CT) scans of a physical anthropomorphic phantom into which manufactured lesions were inserted to train a machine learning algorithm. We treated the training database obtained from the anthropomorphic phantom as a simplified representation of clinical data and increased the variability in this dataset using a set of randomized and parameterized augmentations. Furthermore, to mitigate the inherent differences between phantom and clinical datasets, we investigated adding unlabeled clinical data into the training pipeline. Results: We apply our proposed method to the false positive reduction stage of a lung nodule CADe system in CT scans, in which regions of interest containing potential lesions are classified as nodule or non-nodule regions. Experimental results demonstrate the effectiveness of the proposed method; the system trained on labeled data from physical phantom scans and unlabeled clinical data achieves a sensitivity of 90% at eight false positives per scan. Furthermore, the experimental results demonstrate the benefit of the physical phantom in which the performance in terms of competitive performance metric increased by 6% when a training set consisting of 50 clinical CT scans was enlarged by the scans obtained from the physical phantom. Conclusions: The scalability of synthetic datasets can lead to improved CADe performance, particularly in scenarios in which the size of the labeled clinical data is limited or subject to inherent bias. Our proposed approach demonstrates an effective utilization of synthetic datasets for training machine learning algorithms.

3.
Sci Robot ; 9(90): eadj8124, 2024 05 29.
Artículo en Inglés | MEDLINE | ID: mdl-38809998

RESUMEN

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call artificial microsaccade-enhanced event camera (AMI-EV). Benchmark comparisons validated the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrated the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.


Asunto(s)
Algoritmos , Diseño de Equipo , Robótica , Movimientos Sacádicos , Percepción Visual , Robótica/instrumentación , Humanos , Movimientos Sacádicos/fisiología , Percepción Visual/fisiología , Movimiento (Física) , Programas Informáticos , Tiempo de Reacción/fisiología , Biomimética/instrumentación , Fijación Ocular/fisiología , Movimientos Oculares/fisiología , Visión Ocular/fisiología
4.
Sci Robot ; 8(81): eadd5139, 2023 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-37585545

RESUMEN

Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for perception-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the predictions. Here, we present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture. Furthermore, our class of networks, Ajna, adds minimal computation and requires only a small change to the loss function while training neural networks to obtain uncertainty of predictions, enabling real-time operation even on resource-constrained robots. In addition, we study the informational cues present in the uncertainties of predicted values and their utility in the unification of common robotics problems. In particular, we present an approach to dodge dynamic obstacles, navigate through a cluttered scene, fly through unknown gaps, and segment an object pile, without computing depth but rather using the uncertainties of optical flow obtained from a monocular camera with onboard sensing and computation. We successfully evaluate and demonstrate the proposed Ajna network on four aforementioned common robotics and computer vision tasks and show comparable results to methods directly using depth. Our work demonstrates a generalized deep uncertainty method and demonstrates its utilization in robotics applications.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6703-6714, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33507864

RESUMEN

Human actions involving hand manipulations are structured according to the making and breaking of hand-object contact, and human visual understanding of action is reliant on anticipation of contact as is demonstrated by pioneering work in cognitive science. Taking inspiration from this, we introduce representations and models centered on contact, which we then use in action prediction and anticipation. We annotate a subset of the EPIC Kitchens dataset to include time-to-contact between hands and objects, as well as segmentations of hands and objects. Using these annotations we train the Anticipation Module, a module producing Contact Anticipation Maps and Next Active Object Segmentations - novel low-level representations providing temporal and spatial characteristics of anticipated near future action. On top of the Anticipation Module we apply Egocentric Object Manipulation Graphs (Ego-OMG), a framework for action anticipation and prediction. Ego-OMG models longer term temporal semantic relations through the use of a graph modeling transitions between contact delineated action states. Use of the Anticipation Module within Ego-OMG produces state-of-the-art results, achieving 1st and 2 place on the unseen and seen test sets, respectively, of the EPIC Kitchens Action Anticipation Challenge, and achieving state-of-the-art results on the tasks of action anticipation and action prediction over EPIC Kitchens. We perform ablation studies over characteristics of the Anticipation Module to evaluate their utility.

6.
Med Phys ; 50(7): 4255-4268, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-36630691

RESUMEN

PURPOSE: Machine learning algorithms are best trained with large quantities of accurately annotated samples. While natural scene images can often be labeled relatively cheaply and at large scale, obtaining accurate annotations for medical images is both time consuming and expensive. In this study, we propose a cooperative labeling method that allows us to make use of weakly annotated medical imaging data for the training of a machine learning algorithm. As most clinically produced data are weakly-annotated - produced for use by humans rather than machines and lacking information machine learning depends upon - this approach allows us to incorporate a wider range of clinical data and thereby increase the training set size. METHODS: Our pseudo-labeling method consists of multiple stages. In the first stage, a previously established network is trained using a limited number of samples with high-quality expert-produced annotations. This network is used to generate annotations for a separate larger dataset that contains only weakly annotated scans. In the second stage, by cross-checking the two types of annotations against each other, we obtain higher-fidelity annotations. In the third stage, we extract training data from the weakly annotated scans, and combine it with the fully annotated data, producing a larger training dataset. We use this larger dataset to develop a computer-aided detection (CADe) system for nodule detection in chest CT. RESULTS: We evaluated the proposed approach by presenting the network with different numbers of expert-annotated scans in training and then testing the CADe using an independent expert-annotated dataset. We demonstrate that when availability of expert annotations is severely limited, the inclusion of weakly-labeled data leads to a 5% improvement in the competitive performance metric (CPM), defined as the average of sensitivities at different false-positive rates. CONCLUSIONS: Our proposed approach can effectively merge a weakly-annotated dataset with a small, well-annotated dataset for algorithm training. This approach can help enlarge limited training data by leveraging the large amount of weakly labeled data typically generated in clinical image interpretation.


Asunto(s)
Algoritmos , Tomografía Computarizada por Rayos X , Humanos , Aprendizaje Automático , Aprendizaje Automático Supervisado , Procesamiento de Imagen Asistido por Computador/métodos
7.
Front Robot AI ; 9: 898075, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35783023

RESUMEN

Tactile sensing for robotics is achieved through a variety of mechanisms, including magnetic, optical-tactile, and conductive fluid. Currently, the fluid-based sensors have struck the right balance of anthropomorphic sizes and shapes and accuracy of tactile response measurement. However, this design is plagued by a low Signal to Noise Ratio (SNR) due to the fluid based sensing mechanism "damping" the measurement values that are hard to model. To this end, we present a spatio-temporal gradient representation on the data obtained from fluid-based tactile sensors, which is inspired from neuromorphic principles of event based sensing. We present a novel algorithm (GradTac) that converts discrete data points from spatial tactile sensors into spatio-temporal surfaces and tracks tactile contours across these surfaces. Processing the tactile data using the proposed spatio-temporal domain is robust, makes it less susceptible to the inherent noise from the fluid based sensors, and allows accurate tracking of regions of touch as compared to using the raw data. We successfully evaluate and demonstrate the efficacy of GradTac on many real-world experiments performed using the Shadow Dexterous Hand, equipped with the BioTac SP sensors. Specifically, we use it for tracking tactile input across the sensor's surface, measuring relative forces, detecting linear and rotational slip, and for edge tracking. We also release an accompanying task-agnostic dataset for the BioTac SP, which we hope will provide a resource to compare and quantify various novel approaches, and motivate further research.

8.
IEEE Trans Pattern Anal Mach Intell ; 43(3): 1056-1069, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31514126

RESUMEN

In this paper, we introduce a non-rigid registration pipeline for pairs of unorganized point clouds that may be topologically different. Standard warp field estimation algorithms, even under robust, discontinuity-preserving regularization, tend to produce erratic motion estimates on boundaries associated with 'close-to-open' topology changes. We overcome this limitation by exploiting backward motion: in the opposite motion direction, a 'close-to-open' event becomes 'open-to-close', which is by default handled correctly. At the core of our approach lies a general, topology-agnostic warp field estimation algorithm, similar to those employed in recently introduced dynamic reconstruction systems from RGB-D input. We improve motion estimation on boundaries associated with topology changes in an efficient post-processing phase. Based on both forward and (inverted) backward warp hypotheses, we explicitly detect regions of the deformed geometry that undergo topological changes by means of local deformation criteria and broadly classify them as 'contacts' or 'separations'. Subsequently, the two motion hypotheses are seamlessly blended on a local basis, according to the type and proximity of detected events. Our method achieves state-of-the-art motion estimation accuracy on the MPI Sintel dataset. Experiments on a custom dataset with topological event annotations demonstrate the effectiveness of our pipeline in estimating motion on event boundaries, as well as promising performance in explicit topological event detection.

9.
Sci Robot ; 5(44)2020 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-33022608

RESUMEN

An insect-scale visual sensing system indicates the return of active vision for robotics.

10.
Front Robot AI ; 7: 63, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33501231

RESUMEN

It has been proposed that machine learning techniques can benefit from symbolic representations and reasoning systems. We describe a method in which the two can be combined in a natural and direct way by use of hyperdimensional vectors and hyperdimensional computing. By using hashing neural networks to produce binary vector representations of images, we show how hyperdimensional vectors can be constructed such that vector-symbolic inference arises naturally out of their output. We design the Hyperdimensional Inference Layer (HIL) to facilitate this process and evaluate its performance compared to baseline hashing networks. In addition to this, we show that separate network outputs can directly be fused at the vector symbolic level within HILs to improve performance and robustness of the overall model. Furthermore, to the best of our knowledge, this is the first instance in which meaningful hyperdimensional representations of images are created on real data, while still maintaining hyperdimensionality.

11.
IEEE Trans Pattern Anal Mach Intell ; 31(5): 811-23, 2009 May.
Artículo en Inglés | MEDLINE | ID: mdl-19299857

RESUMEN

Since cameras blur the incoming light during measurement, different images of the same surface do not contain the same information about that surface. Thus, in general, corresponding points in multiple views of a scene have different image intensities. While multiple-view geometry constrains the locations of corresponding points, it does not give relationships between the signals at corresponding locations. This paper offers an elementary treatment of these relationships. We first develop the notion of "ideal" and "real" images, corresponding to, respectively, the raw incoming light and the measured signal. This framework separates the filtering and geometric aspects of imaging. We then consider how to synthesize one view of a surface from another; if the transformation between the two views is affine, it emerges that this is possible if and only if the singular values of the affine matrix are positive. Next, we consider how to combine the information in several views of a surface into a single output image. By developing a new tool called "frequency segmentation," we show how this can be done despite not knowing the blurring kernel.


Asunto(s)
Algoritmos , Inteligencia Artificial , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
14.
Front Psychol ; 9: 2254, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30546328

RESUMEN

In this paper we combine motion captured data with linguistic notions (preliminary study) in a game-like tutoring system (study 1), in order to help elementary school students to better differentiate literal from metaphorical uses of motion verbs, based on embodied information. In addition to the thematic goal, we intend to improve young students' attention and spatiotemporal memory, by presenting sensorimotor data experimentally collected from thirty two participants in our motion capturing labs. Furthermore, we examine the accomplishment of tutor's goals and compare them to curriculum's approach (study 2). Sixty nine elementary school students were randomly divided in two experimental groups (game-like and traditional) and one control group, which did not undergo an intervention. All groups were tested in pre and post-tests. Even though the diagnostic pretests present a uniform picture, two way analysis of variance suggests that the experimental groups showed progress in post-tests and, more specifically, game-like group showed less wrong answers in the linguistics task and higher learning achievements compared to the other two groups. Furthermore, in the game-like condition the participants needed gradually shorter period of time to identify the avatar's actions. This finding was considered as a first indication of attentional and spatiotemporal memory's improvement, while the tutor's assistance features cultivated students' metacognitive perception.

15.
Auton Robots ; 42(2): 177-196, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31983809

RESUMEN

Despite the recent successes in robotics, artificial intelligence and computer vision, a complete artificial agent necessarily must include active perception. A multitude of ideas and methods for how to accomplish this have already appeared in the past, their broader utility perhaps impeded by insufficient computational power or costly hardware. The history of these ideas, perhaps selective due to our perspectives, is presented with the goal of organizing the past literature and highlighting the seminal contributions. We argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in the robotic perception systems of the next decade.

16.
Front Neurosci ; 10: 49, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26941595

RESUMEN

Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS) and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

17.
IEEE Trans Pattern Anal Mach Intell ; 27(6): 988-92, 2005 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-15943429

RESUMEN

We examine the key role of occlusions in finding independently moving objects instantaneously in a video obtained by a moving camera with a restricted field of view. In this problem, the image motion is caused by the combined effect of camera motion (egomotion), structure (depth), and the independent motion of scene entities. For a camera with a restricted field of view undergoing a small motion between frames, there exists, in general, a set of 3D camera motions compatible with the observed flow field even if only a small amount of noise is present, leading to ambiguous 3D motion estimates. If separable sets of solutions exist, motion-based clustering can detect one category of moving objects. Even if a single inseparable set of solutions is found, we show that occlusion information can be used to find ordinal depth, which is critical in identifying a new class of moving objects. In order to find ordinal depth, occlusions must not only be known, but they must also be filled (grouped) with optical flow from neighboring regions. We present a novel algorithm for filling occlusions and deducing ordinal depth under general circumstances. Finally, we describe another category of moving objects which is detected using cardinal comparisons between structure from motion and structure estimates from another source (e.g., stereo).


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Movimiento , Reconocimiento de Normas Patrones Automatizadas/métodos , Fotograbar/métodos , Grabación en Video/métodos , Aumento de la Imagen/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
18.
Philos Trans R Soc Lond B Biol Sci ; 367(1585): 103-17, 2012 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-22106430

RESUMEN

Language and action have been found to share a common neural basis and in particular a common 'syntax', an analogous hierarchical and compositional organization. While language structure analysis has led to the formulation of different grammatical formalisms and associated discriminative or generative computational models, the structure of action is still elusive and so are the related computational models. However, structuring action has important implications on action learning and generalization, in both human cognition research and computation. In this study, we present a biologically inspired generative grammar of action, which employs the structure-building operations and principles of Chomsky's Minimalist Programme as a reference model. In this grammar, action terminals combine hierarchically into temporal sequences of actions of increasing complexity; the actions are bound with the involved tools and affected objects and are governed by certain goals. We show, how the tool role and the affected-object role of an entity within an action drives the derivation of the action syntax in this grammar and controls recursion, merge and move, the latter being mechanisms that manifest themselves not only in human language, but in human action too.


Asunto(s)
Conducta/fisiología , Lingüística , Neuronas/fisiología , Programas Informáticos , Fenómenos Biomecánicos , Biología Computacional , Gestos , Objetivos , Humanos , Aprendizaje , Movimiento , Neurobiología , Reconocimiento Visual de Modelos/fisiología , Desempeño Psicomotor/fisiología
19.
IEEE Trans Pattern Anal Mach Intell ; 34(4): 639-53, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22383341

RESUMEN

Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene, and every fixation point lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the "optimal" closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: First, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the "optimal" closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues (edges). In fact, we propose a segmentation refinement process based on such a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.


Asunto(s)
Algoritmos , Ojo , Procesamiento de Imagen Asistido por Computador/métodos , Visión Ocular/fisiología , Señales (Psicología) , Percepción de Forma , Humanos
20.
PLoS One ; 7(5): e35757, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22590511

RESUMEN

Non-verbal communication enables efficient transfer of information among people. In this context, classic orchestras are a remarkable instance of interaction and communication aimed at a common aesthetic goal: musicians train for years in order to acquire and share a non-linguistic framework for sensorimotor communication. To this end, we recorded violinists' and conductors' movement kinematics during execution of Mozart pieces, searching for causal relationships among musicians by using the Granger Causality method (GC). We show that the increase of conductor-to-musicians influence, together with the reduction of musician-to-musician coordination (an index of successful leadership) goes in parallel with quality of execution, as assessed by musical experts' judgments. Rigorous quantification of sensorimotor communication efficacy has always been complicated and affected by rather vague qualitative methodologies. Here we propose that the analysis of motor behavior provides a potentially interesting tool to approach the rather intangible concept of aesthetic quality of music and visual communication efficacy.


Asunto(s)
Gestos , Liderazgo , Fenómenos Biomecánicos , Humanos , Música
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda