Multi-Task Foreground-Aware Network with Depth Completion for Enhanced RGB-D Fusion Object Detection Based on Transformer.

Pan, Jiasheng; Zhong, Songyi; Yue, Tao; Yin, Yankun; Tang, Yanhao

Pan, Jiasheng; Zhong, Songyi; Yue, Tao; Yin, Yankun; Tang, Yanhao.

Afiliação

Pan J; School of Computer Engineering and Science, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China.
Zhong S; School of Mechatronic Engineering and Automation, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China.
Yue T; School of Artificial Intelligence, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China.
Yin Y; School of Mechatronic Engineering and Automation, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China.
Tang Y; School of Artificial Intelligence, Shanghai University, No. 99 Shangda Road, Shanghai 200444, China.

Sensors (Basel) ; 24(7)2024 Apr 08.

Article em En | MEDLINE | ID: mdl-38610585

ABSTRACT

ABSTRACT

Fusing multiple sensor perceptions, specifically LiDAR and camera, is a prevalent method for target recognition in autonomous driving systems. Traditional object detection algorithms are limited by the sparse nature of LiDAR point clouds, resulting in poor fusion performance, especially for detecting small and distant targets. In this paper, a multi-task parallel neural network based on the Transformer is constructed to simultaneously perform depth completion and object detection. The loss functions are redesigned to reduce environmental noise in depth completion, and a new fusion module is designed to enhance the network's perception of the foreground and background. The network leverages the correlation between RGB pixels for depth completion, completing the LiDAR point cloud and addressing the mismatch between sparse LiDAR features and dense pixel features. Subsequently, we extract depth map features and effectively fuse them with RGB features, fully utilizing the depth feature differences between foreground and background to enhance object detection performance, especially for challenging targets. Compared to the baseline network, improvements of 4.78%, 8.93%, and 15.54% are achieved in the difficult indicators for cars, pedestrians, and cyclists, respectively. Experimental results also demonstrate that the network achieves a speed of 38 fps, validating the efficiency and feasibility of the proposed method.

Palavras-chave

Transformer; YOLO; depth completion; multi-source feature fusion; point cloud data

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article