Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
IEEE Trans Image Process ; 32: 4407-4415, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37527315

RESUMEN

One of the major components of the neural network, the feature pyramid plays a vital part in perception tasks, like object detection in autonomous driving. But it is a challenge to fuse multi-level and multi-sensor feature pyramids for object detection. This paper proposes a simple yet effective framework named MuTrans (Mu ltiple Trans formers) to fuse feature pyramid in single-stream 2D detector or two-stream 3D detector. The MuTrans based on encoder-decoder focuses on the significant features via multiple Transformers. MuTrans encoder uses three innovative self-attention mechanisms: S patial-wise B oxAlign attention (SB) for low-level spatial locations, C ontext-wise A ffinity attention (CA) for high-level context information, and high-level attention for multi-level features. Then MuTrans decoder processes these significant proposals including the RoI and context affinity. Besides, the L ow and H igh-level F usion (LHF) in the encoder reduces the number of computational parameters. And the Pre-LN is utilized to accelerate the training convergence. LHF and Pre-LN are proven to reduce self-attention's computational complexity and slow training convergence. Our result demonstrates the higher detection accuracy of MuTrans than that of the baseline method, particularly in small object detection. MuTrans demonstrates a 2.1 higher detection accuracy on APS index in small object detection on MS-COCO 2017 with ResNeXt-101 backbone, a 2.18 higher 3D detection accuracy (moderate difficulty) for small object-pedestrian on KITTI, and 6.85 higher RC index (Town05 Long) on CARLA urban driving simulator platform.

2.
Artículo en Inglés | MEDLINE | ID: mdl-35905067

RESUMEN

Current learning-based 3-D object detection accuracy is heavily impacted by the annotation quality. It is still a challenge to expect an overall high detection accuracy for all classes under different scenarios given the dataset sparsity. To mitigate this challenge, this article proposes a novel method called semi-supervised learning and progressive distillation (SPD), which uses semi-supervised learning (SSL) and knowledge distillation to improve label efficiency. The SPD uses two big backbones to hand the unlabeled/labeled input data augmented by the periodic IO augmentation (PA). Then the backbones are compressed using progressive distillation (PD). Precisely, PA periodically shifts the data augmentation operations between the input and output of the big backbone, aiming to improve the network's generalization of the unseen and unlabeled data. Using the big backbone can benefit from large-scale augmented data better than the small one. And two backbones are trained by the data scale and ratio-sensitive loss (data-loss). It solves the over-flat caused by the large-scale unlabeled data from PA and helps the big backbone prevent overfitting on the limited-scale labeled data. Hence, using the PA and data loss during SSL training dramatically improves the label efficiency. Next, the trained big backbone set as the teacher CNN is progressively distilled to obtain a small student model, referenced as PD. PD mitigates the problem that student CNN performance degrades when the gap between the student and the teacher is oversized. Extensive experiments are conducted on the indoor datasets SUN RGB-D and ScanNetV2 and outdoor dataset KITTI. Using only 50% labeled data and a 27% smaller model size, SPD performs 0.32 higher than the fully supervised VoteNetqi2019deep which is adopted as our backbone. Besides, using only 2% labeled data, compared to the other fully supervised backbone PV-RCNNshi2020pv, SPD accomplishes a similar accuracy (84.1 and 84.83) and 30% less inference time.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA