RESUMO
Zero-shot learning (ZSL) aims at classifying examples for unseen classes (with no training examples) given some other seen classes (with training examples). Most existing approaches exploit intermedia-level information (e.g., attributes) to transfer knowledge from seen classes to unseen classes. A common practice is to first learn projections from samples to attributes on seen classes via a regression method, and then apply such projections to unseen classes directly. However, it turns out that such a manner of learning strategy easily causes projection domain shift problem and hubness problem, which hinder the performance of ZSL task. In this paper, we also formulate ZSL as an attribute regression problem. However, different from general regression-based solutions, the proposed approach is novel in three aspects. First, a class prototype rectification method is proposed to connect the unseen classes to the seen classes. Here, a class prototype refers to a vector representation of a class, and it is also known as a class center, class signature, or class exemplar. Second, an alternating learning scheme is proposed for jointly performing attribute regression and rectifying the class prototypes. Finally, a new objective function which takes into consideration both the attribute regression accuracy and the class prototype discrimination is proposed. By introducing such a solution, domain shift problem and hubness problem can be mitigated. Experimental results on three public datasets (i.e., CUB200-2011, SUN Attribute, and aPaY) well demonstrate the effectiveness of our approach.
RESUMO
Learning high-level image representations using object proposals has achieved remarkable success in multi-label image recognition. However, most object proposals provide merely coarse information about the objects, and only carefully selected proposals can be helpful for boosting the performance of multi-label image recognition. In this paper, we propose an object-proposal-free framework for multi-label image recognition: random crop pooling (RCP). Basically, RCP performs stochastic scaling and cropping over images before feeding them to a standard convolutional neural network, which works quite well with a max-pooling operation for recognizing the complex contents of multi-label images. To better fit the multi-label image recognition task, we further develop a new loss function-the dynamic weighted Euclidean loss-for the training of the deep network. Our RCP approach is amazingly simple yet effective. It can achieve significantly better image recognition performance than the approaches using object proposals. Moreover, our adapted network can be easily trained in an end-to-end manner. Extensive experiments are conducted on two representative multi-label image recognition data sets (i.e., PASCAL VOC 2007 and PASCAL VOC 2012), and the results clearly demonstrate the superiority of our approach.