RESUMO
Video person re-identification (video Re-ID) plays an important role in surveillance video analysis and has gained increasing attention recently. However, existing supervised methods require vast labeled identities across cameras, resulting in poor scalability in practical applications. Although some unsupervised approaches have been exploited for video Re-ID, they are still in their infancy due to the complex nature of learning discriminative features on unlabelled data. In this paper, we focus on one-shot video Re-ID and present an iterative local-global collaboration learning approach to learning robust and discriminative person representations. Specifically, it jointly considers the global video information and local frame sequence information to better capture the diverse appearance of the person for feature learning and pseudo-label estimation. Moreover, as the cross-entropy loss may induce the model to focus on identity-irrelevant factors, we introduce the variational information bottleneck as a regularization term to train the model together. It can help filter undesirable information and characterize subtle differences among persons. Since accuracy cannot always be guaranteed for pseudo-labels, we adopt a dynamic selection strategy to select part of pseudo-labeled data with higher confidence to update the training set and re-train the learning model. During training, our method iteratively executes the feature learning, pseudo-label estimation, and dynamic sample selection until all the unlabeled data have been seen. Extensive experiments on two public datasets, i.e., DukeMTMC-VideoReID and MARS, have verified the superiority of our model to several cutting-edge competitors.