RESUMO
Current learning-based 3-D object detection accuracy is heavily impacted by the annotation quality. It is still a challenge to expect an overall high detection accuracy for all classes under different scenarios given the dataset sparsity. To mitigate this challenge, this article proposes a novel method called semi-supervised learning and progressive distillation (SPD), which uses semi-supervised learning (SSL) and knowledge distillation to improve label efficiency. The SPD uses two big backbones to hand the unlabeled/labeled input data augmented by the periodic IO augmentation (PA). Then the backbones are compressed using progressive distillation (PD). Precisely, PA periodically shifts the data augmentation operations between the input and output of the big backbone, aiming to improve the network's generalization of the unseen and unlabeled data. Using the big backbone can benefit from large-scale augmented data better than the small one. And two backbones are trained by the data scale and ratio-sensitive loss (data-loss). It solves the over-flat caused by the large-scale unlabeled data from PA and helps the big backbone prevent overfitting on the limited-scale labeled data. Hence, using the PA and data loss during SSL training dramatically improves the label efficiency. Next, the trained big backbone set as the teacher CNN is progressively distilled to obtain a small student model, referenced as PD. PD mitigates the problem that student CNN performance degrades when the gap between the student and the teacher is oversized. Extensive experiments are conducted on the indoor datasets SUN RGB-D and ScanNetV2 and outdoor dataset KITTI. Using only 50% labeled data and a 27% smaller model size, SPD performs 0.32 higher than the fully supervised VoteNetqi2019deep which is adopted as our backbone. Besides, using only 2% labeled data, compared to the other fully supervised backbone PV-RCNNshi2020pv, SPD accomplishes a similar accuracy (84.1 and 84.83) and 30% less inference time.