Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros

Banco de datos
Tipo de estudio
Tipo del documento
Asunto de la revista
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38809744

RESUMEN

We study multi-sensor fusion for 3D semantic segmentation that is important to scene understanding for many applications, such as autonomous driving and robotics. For example, for autonomous cars equipped with RGB cameras and LiDAR, it is crucial to fuse complementary information from different sensors for robust and accurate segmentation. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between the two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to effectively exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we first project point clouds to the camera coordinate using perspective projection. In this way, we can process both inputs from LiDAR and cameras in 2D space while preventing the information loss of RGB images. Then, we propose a two-stream network that consists of a LiDAR stream and a camera stream to extract features from the two modalities, separately. The extracted features are fused by effective residual-based fusion modules. Moreover, we introduce additional perception-aware losses to measure the perceptual difference between the two modalities. Last, we propose an improved version of PMF, i.e., EPMF, which is more efficient and effective by optimizing data pre-processing and network architecture under perspective projection. Specifically, we propose cross-modal alignment and cropping to obtain tight inputs and reduce unnecessary computational costs. We then explore more efficient contextual modules under perspective projection and fuse the LiDAR features into the camera stream to boost the performance of the two-stream network. Extensive experiments on benchmark data sets show the superiority of our method. For example, on nuScenes test set, our EPMF outperforms the state-of-the-art method, i.e., RangeFormer, by 0.9% in mIoU. Compared to PMF, EPMF also achieves 2.06× acceleration with 2.0% improvement in mIoU. Our source code is available at https://github.com/ICEORY/PMF.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4035-4051, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-33755553

RESUMEN

We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. To this end, we first introduce additional discrimination-aware losses into the network to increase the discriminative power of the intermediate layers. Next, we select the most discriminative channels for each layer by considering the discrimination-aware loss and the reconstruction error, simultaneously. We then formulate channel pruning as a sparsity-inducing optimization problem with a convex objective and propose a greedy algorithm to solve the resultant problem. Note that a channel (3D tensor) often consists of a set of kernels (each with a 2D matrix). Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To avoid manually determining the pruning rate for each layer, we propose two adaptive stopping conditions to automatically determine the number of selected channels/kernels. The proposed adaptive stopping conditions tend to yield more efficient models with better performance in practice. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30 percent reduction of channels even outperforms the baseline model by 0.36 percent in terms of Top-1 accuracy. We also deploy the pruned models on a smartphone (equipped with a Qualcomm Snapdragon 845 processor). The pruned MobileNetV1 and MobileNetV2 achieve 1.93× and 1.42× inference acceleration on the mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.


Asunto(s)
Algoritmos , Compresión de Datos , Compresión de Datos/métodos , Presión
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA