Your browser doesn't support javascript.
loading
A novel Q-learning algorithm based on improved whale optimization algorithm for path planning.
Li, Ying; Wang, Hanyu; Fan, Jiahao; Geng, Yanyu.
Afiliação
  • Li Y; College of Computer Science and Technology, Jilin University, Changchun, People's Republic of China.
  • Wang H; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People's Republic of China.
  • Fan J; College of Computer Science and Technology, Jilin University, Changchun, People's Republic of China.
  • Geng Y; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, People's Republic of China.
PLoS One ; 17(12): e0279438, 2022.
Article em En | MEDLINE | ID: mdl-36574399
ABSTRACT
Q-learning is a classical reinforcement learning algorithm and one of the most important methods of mobile robot path planning without a prior environmental model. Nevertheless, Q-learning is too simple when initializing Q-table and wastes too much time in the exploration process, causing a slow convergence speed. This paper proposes a new Q-learning algorithm called the Paired Whale Optimization Q-learning Algorithm (PWOQLA) which includes four improvements. Firstly, to accelerate the convergence speed of Q-learning, a whale optimization algorithm is used to initialize the values of a Q-table. Before the exploration process, a Q-table which contains previous experience is learned to improve algorithm efficiency. Secondly, to improve the local exploitation capability of the whale optimization algorithm, a paired whale optimization algorithm is proposed in combination with a pairing strategy to speed up the search for prey. Thirdly, to improve the exploration efficiency of Q-learning and reduce the number of useless explorations, a new selective exploration strategy is introduced which considers the relationship between current position and target position. Fourthly, in order to balance the exploration and exploitation capabilities of Q-learning so that it focuses on exploration in the early stage and on exploitation in the later stage, a nonlinear function is designed which changes the value of ε in ε-greedy Q-learning dynamically based on the number of iterations. Comparing the performance of PWOQLA with other path planning algorithms, experimental results demonstrate that PWOQLA achieves a higher level of accuracy and a faster convergence speed than existing counterparts in mobile robot path planning. The code will be released at https//github.com/wanghanyu0526/improveQL.git.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Baleias / Algoritmos Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Baleias / Algoritmos Idioma: En Ano de publicação: 2022 Tipo de documento: Article