Comparing Deep Reinforcement Learning Algorithms' Ability to Safely Navigate Challenging Waters.

Larsen, Thomas Nakken; Teigen, Halvor Ødegård; Laache, Torkel; Varagnolo, Damiano; Rasheed, Adil

Larsen, Thomas Nakken; Teigen, Halvor Ødegård; Laache, Torkel; Varagnolo, Damiano; Rasheed, Adil.

Afiliação

Larsen TN; Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway.
Teigen HØ; Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway.
Laache T; Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway.
Varagnolo D; Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway.
Rasheed A; Department of Engineering Cybernetics, Norwegian University of Science and Technology, Trondheim, Norway.

Front Robot AI ; 8: 738113, 2021.

Article em En | MEDLINE | ID: mdl-34589522

RESUMO

Reinforcement Learning (RL) controllers have proved to effectively tackle the dual objectives of path following and collision avoidance. However, finding which RL algorithm setup optimally trades off these two tasks is not necessarily easy. This work proposes a methodology to explore this that leverages analyzing the performance and task-specific behavioral characteristics for a range of RL algorithms applied to path-following and collision-avoidance for underactuated surface vehicles in environments of increasing complexity. Compared to the introduced RL algorithms, the results show that the Proximal Policy Optimization (PPO) algorithm exhibits superior robustness to changes in the environment complexity, the reward function, and when generalized to environments with a considerable domain gap from the training environment. Whereas the proposed reward function significantly improves the competing algorithms' ability to solve the training environment, an unexpected consequence of the dimensionality reduction in the sensor suite, combined with the domain gap, is identified as the source of their impaired generalization performance.

Palavras-chave

autonomous surface vehicle; collision avoidance; deep reinforcement learning; machine learning controller; path following

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Front Robot AI Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Noruega País de publicação: Suíça

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google