RESUMO
Image retrieval performance can be improved by training a convolutional neural network (CNN) model with annotated data to facilitate accurate localization of target regions. However, obtaining sufficiently annotated data is expensive and impractical in real settings. It is challenging to achieve accurate localization of target regions in an unsupervised manner. To address this problem, we propose a new unsupervised image retrieval method named unsupervised target region localization (UTRL) descriptors. It can precisely locate target regions without supervisory information or learning. Our method contains three highlights: 1) we propose a novel zero-label transfer learning method to address the problem of co-localization in target regions. This enhances the potential localization ability of pretrained CNN models through a zero-label data-driven approach; 2) we propose a multiscale attention accumulation method to accurately extract distinguishable target features. It distinguishes the importance of features by using local Gaussian weights; and 3) we propose a simple yet effective method to reduce vector dimensionality, named twice-PCA-whitening (TPW), which reduces the performance degradation caused by feature compression. Notably, TPW is a robust and general method that can be widely applied to image retrieval tasks to improve retrieval performance. This work also facilitates the development of image retrieval based on short vector features. Extensive experiments on six popular benchmark datasets demonstrate that our method achieves about 7% greater mean average precision (mAP) compared to existing state-of-the-art unsupervised methods.