RESUMEN
Human-vehicle classification is an essential component to avoiding accidents in autonomous driving. The classification technique based on the automotive radar sensor has been paid more attention by related researchers, owing to its robustness to low-light conditions and severe weather. In the paper, we propose a hybrid support vector machine-convolutional neural network (SVM-CNN) approach to address the class-imbalance classification of vehicles and pedestrians with limited experimental radar data available. A two-stage scheme with the combination of feature-based SVM technique and deep learning-based CNN is employed. In the first stage, the modified SVM technique based on these distinct physical features is firstly used to recognize vehicles to effectively alleviate the imbalance ratio of vehicles to pedestrians in the data level. Then, the residual unclassified images will be used as inputs to the deep network for the subsequent classification, and we introduce a weighted false error function into deep network architectures to enhance the class-imbalance classification performance at the algorithm level. The proposed SVM-CNN approach takes full advantage of both the locations of underlying class in the entire Range-Doppler image and automatical local feature learning in the CNN with sliding filter bank to improve the classification performance. Experimental results demonstrate the superior performances of the proposed method with the F 1 score of 0.90 and area under the curve (AUC) of the receiver operating characteristic (ROC) of 0.99 over several state-of-the-art methods with limited experimental radar data available in a 77 GHz automotive radar.
Asunto(s)
Accidentes de Tránsito/prevención & control , Vehículos a Motor , Redes Neurales de la Computación , Peatones , Máquina de Vectores de Soporte , Algoritmos , Humanos , RadarRESUMEN
DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1 k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.