Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38593010

RESUMEN

Deep reinforcement learning agents usually need to collect a large number of interactions to solve a single task. In contrast, meta-reinforcement learning (meta-RL) aims to quickly adapt to new tasks using a small amount of experience by leveraging the knowledge from training on a set of similar tasks. State-of-the-art context-based meta-RL algorithms use the context to encode the task information and train a policy conditioned on the inferred latent task encoding. However, most recent works are limited to parametric tasks, where a handful of variables control the full variation in the task distribution, and also failed to work in non-stationary environments due to the few-shot adaptation setting. To address those limitations, we propose MEta-reinforcement Learning with Task Self-discovery (MELTS), which adaptively learns qualitatively different nonparametric tasks and adapts to new tasks in a zero-shot manner. We introduce a novel deep clustering framework (DPMM-VAE) based on an infinite mixture of Gaussians, which combines the Dirichlet process mixture model (DPMM) and the variational autoencoder (VAE), to simultaneously learn task representations and cluster the tasks in a self-adaptive way. Integrating DPMM-VAE into MELTS enables it to adaptively discover the multi-modal structure of the nonparametric task distribution, which previous methods using isotropic Gaussian random variables cannot model. In addition, we propose a zero-shot adaptation mechanism and a recurrence-based context encoding strategy to improve the data efficiency and make our algorithm applicable in non-stationary environments. On various continuous control tasks with both parametric and nonparametric variations, our algorithm produces a more structured and self-adaptive task latent space and also achieves superior sample efficiency and asymptotic performance compared with state-of-the-art meta-RL algorithms.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3476-3491, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-35737617

RESUMEN

In recent years, the subject of deep reinforcement learning (DRL) has developed very rapidly, and is now applied in various fields, such as decision making and control tasks. However, artificial agents trained with RL algorithms require great amounts of training data, unlike humans that are able to learn new skills from very few examples. The concept of meta-reinforcement learning (meta-RL) has been recently proposed to enable agents to learn similar but new skills from a small amount of experience by leveraging a set of tasks with a shared structure. Due to the task representation learning strategy with few-shot adaptation, most recent work is limited to narrow task distributions and stationary environments, where tasks do not change within episodes. In this work, we address those limitations and introduce a training strategy that is applicable to non-stationary environments, as well as a task representation based on Gaussian mixture models to model clustered task distributions. We evaluate our method on several continuous robotic control benchmarks. Compared with state-of-the-art literature that is only applicable to stationary environments with few-shot adaption, our algorithm first achieves competitive asymptotic performance and superior sample efficiency in stationary environments with zero-shot adaption. Second, our algorithm learns to perform successfully in non-stationary settings as well as a continual learning setting, while learning well-structured task representations. Last, our algorithm learns basic distinct behaviors and well-structured task representations in task distributions with multiple qualitatively distinct tasks.

3.
Artículo en Inglés | MEDLINE | ID: mdl-37224358

RESUMEN

Recent state-of-the-art artificial agents lack the ability to adapt rapidly to new tasks, as they are trained exclusively for specific objectives and require massive amounts of interaction to learn new skills. Meta-reinforcement learning (meta-RL) addresses this challenge by leveraging knowledge learned from training tasks to perform well in previously unseen tasks. However, current meta-RL approaches limit themselves to narrow parametric and stationary task distributions, ignoring qualitative differences and nonstationary changes between tasks that occur in the real world. In this article, we introduce a Task-Inference-based meta-RL algorithm using explicitly parameterized Gaussian variational autoencoders (VAEs) and gated Recurrent units (TIGR), designed for nonparametric and nonstationary environments. We employ a generative model involving a VAE to capture the multimodality of the tasks. We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective. We establish a zero-shot adaptation procedure to enable the agent to adapt to nonstationary task changes. We provide a benchmark with qualitatively distinct tasks based on the half-cheetah environment and demonstrate the superior performance of TIGR compared with state-of-the-art meta-RL approaches in terms of sample efficiency (three to ten times faster), asymptotic performance, and applicability in nonparametric and nonstationary environments with zero-shot adaptation. Videos can be viewed at https://videoviewsite.wixsite.com/tigr.

4.
IEEE Trans Neural Netw Learn Syst ; 34(8): 5037-5050, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34762592

RESUMEN

By relabeling past experience with heuristic or curriculum goals, state-of-the-art reinforcement learning (RL) algorithms such as hindsight experience replay (HER), hindsight goal generation (HGG), and graph-based HGG (G-HGG) have been able to solve challenging robotic manipulation tasks in multigoal settings with sparse rewards. HGG outperforms HER in challenging tasks in which goals are difficult to explore by learning from a curriculum, in which intermediate goals are selected based on the Euclidean distance to target goals. G-HGG enhances HGG by selecting intermediate goals from a precomputed graph representation of the environment, which enables its applicability in an environment with stationary obstacles. However, G-HGG is not applicable to manipulation tasks with dynamic obstacles, since its graph representation is only valid in static scenarios and fails to provide any correct information to guide the exploration. In this article, we propose bounding-box-based HGG (Bbox-HGG), an extension of G-HGG selecting hindsight goals with the help of image observations of the environment, which makes it applicable to tasks with dynamic obstacles. We evaluate Bbox-HGG on four challenging manipulation tasks, where significant enhancements in both sample efficiency and overall success rate are shown over state-of-the-art algorithms. The videos can be viewed at https://videoviewsite.wixsite.com/bbhgg.

5.
Sci Robot ; 8(85): eadg7165, 2023 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-38055804

RESUMEN

A flexible spine is critical to the motion capability of most animals and plays a pivotal role in their agility. Although state-of-the-art legged robots have already achieved very dynamic and agile movement solely relying on their legs, they still exhibit the type of stiff movement that compromises movement efficiency. The integration of a flexible spine thus appears to be a promising approach to improve their agility, especially for small and underactuated quadruped robots that are underpowered because of size limitations. Here, we show that the lateral flexion of a compliant spine can promote both walking speed and maneuver agility for a neurorobotic mouse (NeRmo). We present NeRmo as a biomimetic robotic mouse that mimics the morphology of biological mice and their muscle-tendon actuation system. First, by leveraging the lateral flexion of the compliant spine, NeRmo can greatly increase its static stability in an initially unstable configuration by adjusting its posture. Second, the lateral flexion of the spine can also effectively extend the stride length of a gait and therefore improve the walking speeds of NeRmo. Finally, NeRmo shows agile maneuvers that require both a small turning radius and fast walking speed with the help of the spine. These results advance our understanding of spine-based quadruped locomotion skills and highlight promising design concepts to develop more agile legged robots.


Asunto(s)
Robótica , Animales , Ratones , Robótica/métodos , Marcha , Movimiento , Postura , Movimiento (Física)
6.
IEEE Trans Neural Netw Learn Syst ; 33(5): 2147-2158, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-34860654

RESUMEN

As a vital cognitive function of animals, the navigation skill is first built on the accurate perception of the directional heading in the environment. Head direction cells (HDCs), found in the limbic system of animals, are proven to play an important role in identifying the directional heading allocentrically in the horizontal plane, independent of the animal's location and the ambient conditions of the environment. However, practical HDC models that can be implemented in robotic applications are rarely investigated, especially those that are biologically plausible and yet applicable to the real world. In this article, we propose a computational HDC network that is consistent with several neurophysiological findings concerning biological HDCs and then implement it in robotic navigation tasks. The HDC network keeps a representation of the directional heading only relying on the angular velocity as an input. We examine the proposed HDC model in extensive simulations and real-world experiments and demonstrate its excellent performance in terms of accuracy and real-time capability.


Asunto(s)
Cognición , Redes Neurales de la Computación , Animales
7.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7863-7876, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34181552

RESUMEN

Reinforcement learning algorithms, such as hindsight experience replay (HER) and hindsight goal generation (HGG), have been able to solve challenging robotic manipulation tasks in multigoal settings with sparse rewards. HER achieves its training success through hindsight replays of past experience with heuristic goals but underperforms in challenging tasks in which goals are difficult to explore. HGG enhances HER by selecting intermediate goals that are easy to achieve in the short term and promising to lead to target goals in the long term. This guided exploration makes HGG applicable to tasks in which target goals are far away from the object's initial position. However, the vanilla HGG is not applicable to manipulation tasks with obstacles because the Euclidean metric used for HGG is not an accurate distance metric in such an environment. Although, with the guidance of a handcrafted distance grid, grid-based HGG can solve manipulation tasks with obstacles, a more feasible method that can solve such tasks automatically is still in demand. In this article, we propose graph-based hindsight goal generation (G-HGG), an extension of HGG selecting hindsight goals based on shortest distances in an obstacle-avoiding graph, which is a discrete representation of the environment. We evaluated G-HGG on four challenging manipulation tasks with obstacles, where significant enhancements in both sample efficiency and overall success rate are shown over HGG and HER. Videos can be viewed at https://videoviewsite.wixsite.com/ghgg.

8.
Front Neurorobot ; 15: 688344, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34163347

RESUMEN

The development of advanced autonomous driving applications is hindered by the complex temporal structure of sensory data, as well as by the limited computational and energy resources of their on-board systems. Currently, neuromorphic engineering is a rapidly growing field that aims to design information processing systems similar to the human brain by leveraging novel algorithms based on spiking neural networks (SNNs). These systems are well-suited to recognize temporal patterns in data while maintaining a low energy consumption and offering highly parallel architectures for fast computation. However, the lack of effective algorithms for SNNs impedes their wide usage in mobile robot applications. This paper addresses the problem of radar signal processing by introducing a novel SNN that substitutes the discrete Fourier transform and constant false-alarm rate algorithm for raw radar data, where the weights and architecture of the SNN are derived from the original algorithms. We demonstrate that our proposed SNN can achieve competitive results compared to that of the original algorithms in simulated driving scenarios while retaining its spike-based nature.

9.
Neural Netw ; 129: 323-333, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-32593929

RESUMEN

Similar to real snakes in nature, the flexible trunks of snake-like robots enhance their movement capabilities and adaptabilities in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently and adaptively to unforeseeable joint damage. In this work, we present an approach for designing an energy-efficient and damage-recovery slithering gait for a snake-like robot using the reinforcement learning (RL) algorithm and the inverse reinforcement learning (IRL) algorithm. Specifically, we first present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Then, by taking the RL-based controller as an expert and collecting trajectories from it, we train an IRL-based controller using the adversarial inverse reinforcement learning (AIRL) algorithm. For the purpose of comparison, a traditional parameterized gait controller is presented as the baseline and the parameter sets are optimized using the grid search and Bayesian optimization algorithm. Based on the analysis of the simulation results, we first demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. We then demonstrate that the IRL-based controller cannot only exhibit similar performances as the RL-based controller, but can also recover from the unpredictable damage body joints and still outperform the model-based controller, which has an undamaged body, in terms of energy efficiency. Videos can be viewed at https://videoviewsite.wixsite.com/rlsnake.


Asunto(s)
Metabolismo Energético , Marcha , Aprendizaje , Refuerzo en Psicología , Robótica/métodos , Algoritmos , Teorema de Bayes , Metabolismo Energético/fisiología , Marcha/fisiología , Aprendizaje/fisiología , Robótica/instrumentación
10.
Neural Netw ; 121: 21-36, 2020 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31526952

RESUMEN

Building spiking neural networks (SNNs) based on biological synaptic plasticities holds a promising potential for accomplishing fast and energy-efficient computing, which is beneficial to mobile robotic applications. However, the implementations of SNNs in robotic fields are limited due to the lack of practical training methods. In this paper, we therefore introduce both indirect and direct end-to-end training methods of SNNs for a lane-keeping vehicle. First, we adopt a policy learned using the Deep Q-Learning (DQN) algorithm and then subsequently transfer it to an SNN using supervised learning. Second, we adopt the reward-modulated spike-timing-dependent plasticity (R-STDP) for training SNNs directly, since it combines the advantages of both reinforcement learning and the well-known spike-timing-dependent plasticity (STDP). We examine the proposed approaches in three scenarios in which a robot is controlled to keep within lane markings by using an event-based neuromorphic vision sensor. We further demonstrate the advantages of the R-STDP approach in terms of the lateral localization accuracy and training time steps by comparing them with other three algorithms presented in this paper.


Asunto(s)
Redes Neurales de la Computación , Plasticidad Neuronal/fisiología , Neuronas/fisiología , Robótica/instrumentación , Algoritmos , Refuerzo en Psicología , Robótica/métodos
11.
Front Neurorobot ; 14: 591128, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33192441

RESUMEN

Visual-guided locomotion for snake-like robots is a challenging task, since it involves not only the complex body undulation with many joints, but also a joint pipeline that connects the vision and the locomotion. Meanwhile, it is usually difficult to jointly coordinate these two separate sub-tasks as this requires time-consuming and trial-and-error tuning. In this paper, we introduce a novel approach for solving target tracking tasks for a snake-like robot as a whole using a model-free reinforcement learning (RL) algorithm. This RL-based controller directly maps the visual observations to the joint positions of the snake-like robot in an end-to-end fashion instead of dividing the process into a series of sub-tasks. With a novel customized reward function, our RL controller is trained in a dynamically changing track scenario. The controller is evaluated in four different tracking scenarios and the results show excellent adaptive locomotion ability to the unpredictable behavior of the target. Meanwhile, the results also prove that the RL-based controller outperforms the traditional model-based controller in terms of tracking accuracy.

12.
Front Neurorobot ; 13: 29, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31191288

RESUMEN

Vision based-target tracking ability is crucial to bio-inspired snake robots for exploring unknown environments. However, it is difficult for the traditional vision modules of snake robots to overcome the image blur resulting from periodic swings. A promising approach is to use a neuromorphic vision sensor (NVS), which mimics the biological retina to detect a target at a higher temporal frequency and in a wider dynamic range. In this study, an NVS and a spiking neural network (SNN) were performed on a snake robot for the first time to achieve pipe-like object tracking. An SNN based on Hough Transform was designed to detect a target with an asynchronous event stream fed by the NVS. Combining the state of snake motion analyzed by the joint position sensors, a tracking framework was proposed. The experimental results obtained from the simulator demonstrated the validity of our framework and the autonomous locomotion ability of our snake robot. Comparing the performances of the SNN model on CPUs and on GPUs, respectively, the SNN model showed the best performance on a GPU under a simplified and synchronous update rule while it possessed higher precision on a CPU in an asynchronous way.

13.
Front Neurorobot ; 13: 18, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31130854

RESUMEN

Spiking neural networks (SNNs) offer many advantages over traditional artificial neural networks (ANNs) such as biological plausibility, fast information processing, and energy efficiency. Although SNNs have been used to solve a variety of control tasks using the Spike-Timing-Dependent Plasticity (STDP) learning rule, existing solutions usually involve hard-coded network architectures solving specific tasks rather than solving different kinds of tasks generally. This results in neglecting one of the biggest advantages of ANNs, i.e., being general-purpose and easy-to-use due to their simple network architecture, which usually consists of an input layer, one or multiple hidden layers and an output layer. This paper addresses the problem by introducing an end-to-end learning approach of spiking neural networks constructed with one hidden layer and reward-modulated Spike-Timing-Dependent Plasticity (R-STDP) synapses in an all-to-all fashion. We use the supervised reward-modulated Spike-Timing-Dependent-Plasticity learning rule to train two different SNN-based sub-controllers to replicate a desired obstacle avoiding and goal approaching behavior, provided by pre-generated datasets. Together they make up a target-reaching controller, which is used to control a simulated mobile robot to reach a target area while avoiding obstacles in its path. We demonstrate the performance and effectiveness of our trained SNNs to achieve target reaching tasks in different unknown scenarios.

14.
Front Neurorobot ; 17: 1158988, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36925627
15.
Front Neurorobot ; 12: 35, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30034334

RESUMEN

Biological intelligence processes information using impulses or spikes, which makes those living creatures able to perceive and act in the real world exceptionally well and outperform state-of-the-art robots in almost every aspect of life. To make up the deficit, emerging hardware technologies and software knowledge in the fields of neuroscience, electronics, and computer science have made it possible to design biologically realistic robots controlled by spiking neural networks (SNNs), inspired by the mechanism of brains. However, a comprehensive review on controlling robots based on SNNs is still missing. In this paper, we survey the developments of the past decade in the field of spiking neural networks for control tasks, with particular focus on the fast emerging robotics-related applications. We first highlight the primary impetuses of SNN-based robotics tasks in terms of speed, energy efficiency, and computation capabilities. We then classify those SNN-based robotic applications according to different learning rules and explicate those learning rules with their corresponding robotic applications. We also briefly present some existing platforms that offer an interaction between SNNs and robotics simulations for exploration and exploitation. Finally, we conclude our survey with a forecast of future challenges and some associated potential research topics in terms of controlling robots based on SNNs.

17.
Bioinspir Biomim ; 12(3): 035001, 2017 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-28375848

RESUMEN

Snake-like robots with 3D locomotion ability have significant advantages of adaptive travelling in diverse complex terrain over traditional legged or wheeled mobile robots. Despite numerous developed gaits, these snake-like robots suffer from unsmooth gait transitions by changing the locomotion speed, direction, and body shape, which would potentially cause undesired movement and abnormal torque. Hence, there exists a knowledge gap for snake-like robots to achieve autonomous locomotion. To address this problem, this paper presents the smooth slithering gait transition control based on a lightweight central pattern generator (CPG) model for snake-like robots. First, based on the convergence behavior of the gradient system, a lightweight CPG model with fast computing time was designed and compared with other widely adopted CPG models. Then, by reshaping the body into a more stable geometry, the slithering gait was modified, and studied based on the proposed CPG model, including the gait transition of locomotion speed, moving direction, and body shape. In contrast to sinusoid-based method, extensive simulations and prototype experiments finally demonstrated that smooth slithering gait transition can be effectively achieved using the proposed CPG-based control method without generating undesired locomotion and abnormal torque.


Asunto(s)
Materiales Biomiméticos , Biomimética/instrumentación , Locomoción , Modelos Teóricos , Robótica/instrumentación , Serpientes , Animales , Diseño de Equipo , Marcha
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA