Pesquisa | Biblioteca Virtual em Saúde

End-to-End AUV Motion Planning Method Based on Soft Actor-Critic.

Yu, Xin; Sun, Yushan; Wang, Xiangbin; Zhang, Guocheng.

Sensors (Basel) ; 21(17)2021 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-34502781

RESUMO

This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor-critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.

Assuntos

Algoritmos , Aprendizagem , Simulação por Computador , Movimento (Física)

Modeling Car-Following Behaviors and Driving Styles with Generative Adversarial Imitation Learning.

Zhou, Yang; Fu, Rui; Wang, Chang; Zhang, Ruibin.

Sensors (Basel) ; 20(18)2020 Sep 04.

Artigo em Inglês | MEDLINE | ID: mdl-32899773

RESUMO

Building a human-like car-following model that can accurately simulate drivers' car-following behaviors is helpful to the development of driving assistance systems and autonomous driving. Recent studies have shown the advantages of applying reinforcement learning methods in car-following modeling. However, a problem has remained where it is difficult to manually determine the reward function. This paper proposes a novel car-following model based on generative adversarial imitation learning. The proposed model can learn the strategy from drivers' demonstrations without specifying the reward. Gated recurrent units was incorporated in the actor-critic network to enable the model to use historical information. Drivers' car-following data collected by a test vehicle equipped with a millimeter-wave radar and controller area network acquisition card was used. The participants were divided into two driving styles by K-means with time-headway and time-headway when braking used as input features. Adopting five-fold cross-validation for model evaluation, the results show that the proposed model can reproduce drivers' car-following trajectories and driving styles more accurately than the intelligent driver model and the recurrent neural network-based model, with the lowest average spacing error (19.40%) and speed validation error (5.57%), as well as the lowest Kullback-Leibler divergences of the two indicators used for driving style clustering.

Assuntos

Condução de Veículo , Automóveis , Acidentes de Trânsito , Análise por Conglomerados , Simulação por Computador , Humanos , Comportamento Imitativo

BAGAIL: Multi-modal imitation learning from imbalanced demonstrations.

Gu, Sijia; Zhu, Fei.

Neural Netw ; 174: 106251, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38552352

RESUMO

Expert demonstrations in imitation learning often contain different behavioral modes, e.g., driving modes such as driving on the left, keeping the lane, and driving on the right in the driving tasks. Although most existing multi-modal imitation learning methods allow learning from demonstrations of multiple modes, they have strict constraints on the data of each mode, generally requiring a near data ratio of all modes. Otherwise, it tends to fall into a mode collapse or only learn the data distribution of the mode that has the largest data volume. To address the problem, an algorithm that balances real-fake loss and classification loss by modifying the output of the discriminator, referred to as BAlanced Generative Adversarial Imitation Learning (BAGAIL), is proposed. With this modification, the generator is only rewarded for generating real trajectories with correct modes. BAGAIL is therefore able to deal with imbalanced expert demonstrations and carry out efficient learning for each mode. The learning process of BAGAIL is divided into a pre-training stage and an imitation learning stage. During the pre-training stage, BAGAIL initializes the generator parameters by means of conditional Behavioral Cloning, laying the foundation for the direction of parameter optimization. During the imitation learning stage, BAGAIL optimizes the parameters by using the adversary between the generator and the modified discriminator so that the finally obtained policy can successfully learn the distribution of imbalanced expert data. The experiments showed that BAGAIL accurately distinguished different behavioral modes with imbalanced demonstrations. What is more, the learning result of each mode is close to the expert standard and more stable than other multi-modal imitation learning methods.

Assuntos

Comportamento Imitativo , Aprendizagem , Algoritmos , Políticas , Recompensa

Distributional generative adversarial imitation learning with reproducing kernel generalization.

Zhou, Yirui; Lu, Mengxiao; Liu, Xiaowei; Che, Zhengping; Xu, Zhiyuan; Tang, Jian; Zhang, Yangchun; Peng, Yan; Peng, Yaxin.

Neural Netw ; 165: 43-59, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37276810

RESUMO

Generative adversarial imitation learning (GAIL) regards imitation learning (IL) as a distribution matching problem between the state-action distributions of the expert policy and the learned policy. In this paper, we focus on the generalization and computational properties of policy classes. We prove that the generalization can be guaranteed in GAIL when the class of policies is well controlled. With the capability of policy generalization, we introduce distributional reinforcement learning (RL) into GAIL and propose the greedy distributional soft gradient (GDSG) algorithm to solve GAIL. The main advantages of GDSG can be summarized as: (1) Q-value overestimation, a crucial factor leading to the instability of GAIL with off-policy training, can be alleviated by distributional RL. (2) By considering the maximum entropy objective, the policy can be improved in terms of performance and sample efficiency through sufficient exploration. Moreover, GDSG attains a sublinear convergence rate to a stationary solution. Comprehensive experimental verification in MuJoCo environments shows that GDSG can mimic expert demonstrations better than previous GAIL variants.

Assuntos

Comportamento Imitativo , Aprendizagem , Generalização Psicológica , Reforço Psicológico , Algoritmos

Restored Action Generative Adversarial Imitation Learning from observation for robot manipulator.

Park, Jongcheon; Han, Seungyong; Lee, S M.

ISA Trans ; 129(Pt B): 684-690, 2022 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-35292172

RESUMO

In this paper, a new imitation learning algorithm is proposed based on the Restored Action Generative Adversarial Imitation Learning (RAGAIL) from observation. An action policy is trained to move a robot manipulator similar to a demonstrator's behavior by using the restored action from state-only demonstration. To imitate the demonstrator, the trajectory is generated by Recurrent Generative Adversarial Networks (RGAN), and the action is restored from the output of the tracking controller constructed by the state and the generated target trajectory. The proposed imitation learning algorithm is not required to access the demonstrator's action (internal control signal such as force/torque command) and provides better learning performances. The effectiveness of the proposed method is validated through the experimental results of the robot manipulator.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA