Bilateral Cross-Modal Fusion Network for Robot Grasp Detection.
Sensors (Basel)
; 23(6)2023 Mar 22.
Article
em En
| MEDLINE
| ID: mdl-36992051
In the field of vision-based robot grasping, effectively leveraging RGB and depth information to accurately determine the position and pose of a target is a critical issue. To address this challenge, we proposed a tri-stream cross-modal fusion architecture for 2-DoF visual grasp detection. This architecture facilitates the interaction of RGB and depth bilateral information and was designed to efficiently aggregate multiscale information. Our novel modal interaction module (MIM) with a spatial-wise cross-attention algorithm adaptively captures cross-modal feature information. Meanwhile, the channel interaction modules (CIM) further enhance the aggregation of different modal streams. In addition, we efficiently aggregated global multiscale information through a hierarchical structure with skipping connections. To evaluate the performance of our proposed method, we conducted validation experiments on standard public datasets and real robot grasping experiments. We achieved image-wise detection accuracy of 99.4% and 96.7% on Cornell and Jacquard datasets, respectively. The object-wise detection accuracy reached 97.8% and 94.6% on the same datasets. Furthermore, physical experiments using the 6-DoF Elite robot demonstrated a success rate of 94.5%. These experiments highlight the superior accuracy of our proposed method.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Tipo de estudo:
Diagnostic_studies
Idioma:
En
Ano de publicação:
2023
Tipo de documento:
Article