Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer.

Guo, Hao; Song, Meichao; Ding, Zhen; Yi, Chunzhi; Jiang, Feng

Guo, Hao; Song, Meichao; Ding, Zhen; Yi, Chunzhi; Jiang, Feng.

Afiliación

Guo H; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Song M; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Ding Z; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Yi C; School of Medicine and Health, Harbin Institute of Technology, Harbin 150001, China.
Jiang F; School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.

Sensors (Basel) ; 23(1)2023 Jan 03.

Article en En | MEDLINE | ID: mdl-36617113

ABSTRACT

ABSTRACT

Learning from visual observation for efficient robotic manipulation is a hitherto significant challenge in Reinforcement Learning (RL). Although the collocation of RL policies and convolution neural network (CNN) visual encoder achieves high efficiency and success rate, the method general performance for multi-tasks is still limited to the efficacy of the encoder. Meanwhile, the increasing cost of the encoder optimization for general performance could debilitate the efficiency advantage of the original policy. Building on the attention mechanism, we design a robotic manipulation method that significantly improves the policy general performance among multitasks with the lite Transformer based visual encoder, unsupervised learning, and data augmentation. The encoder of our method could achieve the performance of the original Transformer with much less data, ensuring efficiency in the training process and intensifying the general multi-task performances. Furthermore, we experimentally demonstrate that the master view outperforms the other alternative third-person views in the general robotic manipulation tasks when combining the third-person and egocentric views to assimilate global and local visual information. After extensively experimenting with the tasks from the OpenAI Gym Fetch environment, especially in the Push task, our method succeeds in 92% versus baselines that of 65%, 78% for the CNN encoder, 81% for the ViT encoder, and with fewer training steps.

Asunto(s)

Procedimientos Quirúrgicos Robotizados; Robótica; Humanos; Suministros de Energía Eléctrica; Redes Neurales de la Computación; Políticas

Palabras clave

bio-inspired design and control of robots; reinforcement learning; robotics; vision transformer

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Robótica / Procedimientos Quirúrgicos Robotizados Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: Sensors (Basel) Año: 2023 Tipo del documento: Article País de afiliación: China

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google