Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 2266-2278, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38470581

RESUMO

The problem of sketch semantic segmentation is far from being solved. Despite existing methods exhibiting near-saturating performances on simple sketches with high recognisability, they suffer serious setbacks when the target sketches are products of an imaginative process with high degree of creativity. We hypothesise that human creativity, being highly individualistic, induces a significant shift in distribution of sketches, leading to poor model generalisation. Such hypothesis, backed by empirical evidences, opens the door for a solution that explicitly disentangles creativity while learning sketch representations. We materialise this by crafting a learnable creativity estimator that assigns a scalar score of creativity to each sketch. It follows that we introduce CreativeSeg, a learning-to-learn framework that leverages the estimator in order to learn creativity-agnostic representation, and eventually the downstream semantic segmentation task. We empirically verify the superiority of CreativeSeg on the recent "Creative Birds" and "Creative Creatures" creative sketch datasets. Through a human study, we further strengthen the case that the learned creativity score does indeed have a positive correlation with the subjective creativity of human. Codes are available at https://github.com/PRIS-CV/Sketch-CS.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38478433

RESUMO

The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting - a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. We introduce the snapshot ensemble method in the episodic learning strategy - a simple trick to further improve model performance without increasing training costs. Experimental results on three widely used fine-grained image classification datasets, as well as general and cross-domain few-shot image datasets, consistently show considerable improvements compared with other methods. Codes are available at https://github.com/PRIS-CV/BiEN.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 2658-2671, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37801380

RESUMO

Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for. Semi-supervised learning (SSL) techniques, acquiring knowledge from unlabeled data, provide a considerable means forward and have shown great promise for coarse-grained problems. However, exiting SSL paradigms mostly assume in-category (i.e., category-aligned) unlabeled data, which hinders their effectiveness when re-proposed on FGVC. In this paper, we put forward a novel design specifically aimed at making out-of-category data work for semi-supervised FGVC. We work off an important assumption that all fine-grained categories naturally follow a hierarchical structure (e.g., the phylogenetic tree of "Aves" that covers all bird species). It follows that, instead of operating on individual samples, we can instead predict sample relations within this tree structure as the optimization goal of SSL. Beyond this, we further introduced two strategies uniquely brought by these tree structures to achieve inter-sample consistency regularization and reliable pseudo-relation. Our experimental results reveal that (i) the proposed method yields good robustness against out-of-category data, and (ii) it can be equipped with prior arts, boosting their performance thus yielding state-of-the-art results.

4.
Phys Rev Lett ; 131(12): 126502, 2023 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-37802946

RESUMO

Environment-induced localization transitions (LT) occur when a small quantum system interacts with a bath of harmonic oscillators. At equilibrium, LTs are accompanied by an entropy change, signaling the loss of coherence. Despite extensive efforts, equilibrium LTs have yet to be observed. Here, we demonstrate that ongoing experiments on double quantum dots that measure entropy using a nearby quantum point contact realize the celebrated spin-boson model and allow to measure the entropy change of its LT. We find a Kosterlitz-Thouless flow diagram, leading to a universal jump in the spin-bath interaction, reflected in a discontinuity in the zero temperature QPC conductance.

5.
IEEE Trans Image Process ; 32: 4595-4609, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37561619

RESUMO

Sketch is a well-researched topic in the vision community by now. Sketch semantic segmentation in particular, serves as a fundamental step towards finer-level sketch interpretation. Recent works use various means of extracting discriminative features from sketches and have achieved considerable improvements on segmentation accuracy. Common approaches for this include attending to the sketch-image as a whole, its stroke-level representation or the sequence information embedded in it. However, they mostly focus on only a part of such multi-facet information. In this paper, we for the first time demonstrate that there is complementary information to be explored across all these three facets of sketch data, and that segmentation performance consequently benefits as a result of such exploration of sketch-specific information. Specifically, we propose the Sketch-Segformer, a transformer-based framework for sketch semantic segmentation that inherently treats sketches as stroke sequences other than pixel-maps. In particular, Sketch-Segformer introduces two types of self-attention modules having similar structures that work with different receptive fields (i.e., whole sketch or individual stroke). The order embedding is then further synergized with spatial embeddings learned from the entire sketch as well as localized stroke-level information. Extensive experiments show that our sketch-specific design is not only able to obtain state-of-the-art performance on traditional figurative sketches (such as SPG, SketchSeg-150K datasets), but also performs well on creative sketches that do not conform to conventional object semantics (CreativeSketch dataset) thanks for our usage of multi-facet sketch information. Ablation studies, visualizations, and invariance tests further justifies our design choice and the effectiveness of Sketch-Segformer. Codes are available at https://github.com/PRIS-CV/Sketch-SF.

6.
J Behav Addict ; 12(2): 458-470, 2023 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-37209127

RESUMO

Background and aims: Impaired value-based decision-making is a feature of substance and behavioral addictions. Loss aversion is a core of value-based decision-making and its alteration plays an important role in addiction. However, few studies explored it in internet gaming disorder patients (IGD). Methods: In this study, IGD patients (PIGD) and healthy controls (Con-PIGD) performed the Iowa gambling task (IGT), under functional magnetic resonance imaging (fMRI). We investigated group differences in loss aversion, brain functional networks of node-centric functional connectivity (nFC) and the overlapping community features of edge-centric functional connectivity (eFC) in IGT. Results: PIGD performed worse with lower average net score in IGT. The computational model results showed that PIGD significantly reduced loss aversion. There was no group difference in nFC. However, there were significant group differences in the overlapping community features of eFC1. Furthermore, in Con-PIGD, loss aversion was positively correlated with the edge community profile similarity of the edge2 between left IFG and right hippocampus at right caudate. This relationship was suppressed by response consistency3 in PIGD. In addition, reduced loss aversion was negatively correlated with the promoted bottom-to-up neuromodulation from the right hippocampus to the left IFG in PIGD. Discussion and conclusions: The reduced loss aversion in value-based decision making and their related edge-centric functional connectivity support that the IGD showed the same value-based decision-making deficit as the substance use and other behavioral addictive disorders. These findings may have important significance for understanding the definition and mechanism of IGD in the future.


Assuntos
Comportamento Aditivo , Jogos de Vídeo , Humanos , Mapeamento Encefálico/métodos , Transtorno de Adição à Internet/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Comportamento Aditivo/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodos , Internet
7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12068-12084, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37159309

RESUMO

As powerful as fine-grained visual classification (FGVC) is, responding your query with a bird name of "Whip-poor-will" or "Mallard" probably does not make much sense. This however commonly accepted in the literature, underlines a fundamental question interfacing AI and human - what constitutes transferable knowledge for human to learn from AI? This paper sets out to answer this very question using FGVC as a test bed. Specifically, we envisage a scenario where a trained FGVC model (the AI expert) functions as a knowledge provider in enabling average people (you and me) to become better domain experts ourselves. Assuming an AI expert trained using expert human labels, we anchor our focus on asking and providing solutions for two questions: (i) what is the best transferable knowledge we can extract from AI, and (ii) what is the most practical means to measure the gains in expertise given that knowledge? We propose to represent knowledge as highly discriminative visual regions that are expert-exclusive and instantiate it via a novel multi-stage learning framework. A human study of 15,000 trials shows our method is able to consistently improve people of divergent bird expertise to recognise once unrecognisable birds. We further propose a crude but benchmarkable metric TEMI and therefore allow future efforts in this direction to be comparable to ours without the need of large-scale human studies.


Assuntos
Algoritmos , Aves , Animais , Humanos
8.
IEEE Trans Image Process ; 32: 2309-2321, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37058380

RESUMO

Recently, clustering-based methods have been the dominant solution for unsupervised person re-identification (ReID). Memory-based contrastive learning is widely used for its effectiveness in unsupervised representation learning. However, we find that the inaccurate cluster proxies and the momentum updating strategy do harm to the contrastive learning system. In this paper, we propose a real-time memory updating strategy (RTMem) to update the cluster centroid with a randomly sampled instance feature in the current mini-batch without momentum. Compared to the method that calculates the mean feature vectors as the cluster centroid and updating it with momentum, RTMem enables the features to be up-to-date for each cluster. Based on RTMem, we propose two contrastive losses, i.e., sample-to-instance and sample-to-cluster, to align the relationships between samples to each cluster and to all outliers not belonging to any other clusters. On the one hand, sample-to-instance loss explores the sample relationships of the whole dataset to enhance the capability of density-based clustering algorithm, which relies on similarity measurement for the instance-level images. On the other hand, with pseudo-labels generated by the density-based clustering algorithm, sample-to-cluster loss enforces the sample to be close to its cluster proxy while being far from other proxies. With the simple RTMem contrastive learning strategy, the performance of the corresponding baseline is improved by 9.3% on Market-1501 dataset. Our method consistently outperforms state-of-the-art unsupervised learning person ReID methods on three benchmark datasets. Code is made available at:https://github.com/PRIS-CV/RTMem.

9.
IEEE Trans Neural Netw Learn Syst ; 34(4): 1823-1837, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32248126

RESUMO

As a typical non-Gaussian vector variable, a neutral vector variable contains nonnegative elements only, and its l1 -norm equals one. In addition, its neutral properties make it significantly different from the commonly studied vector variables (e.g., the Gaussian vector variables). Due to the aforementioned properties, the conventionally applied linear transformation approaches [e.g., principal component analysis (PCA) and independent component analysis (ICA)] are not suitable for neutral vector variables, as PCA cannot transform a neutral vector variable, which is highly negatively correlated, into a set of mutually independent scalar variables and ICA cannot preserve the bounded property after transformation. In recent work, we proposed an efficient nonlinear transformation approach, i.e., the parallel nonlinear transformation (PNT), for decorrelating neutral vector variables. In this article, we extensively compare PNT with PCA and ICA through both theoretical analysis and experimental evaluations. The results of our investigations demonstrate the superiority of PNT for decorrelating the neutral vector variables.

10.
Artigo em Inglês | MEDLINE | ID: mdl-36374886

RESUMO

The task of aspect-based sentiment analysis aims to identify sentiment polarities of given aspects in a sentence. Recent advances have demonstrated the advantage of incorporating the syntactic dependency structure with graph convolutional networks (GCNs). However, their performance of these GCN-based methods largely depends on the dependency parsers, which would produce diverse parsing results for a sentence. In this article, we propose a dual GCN (DualGCN) that jointly considers the syntax structures and semantic correlations. Our DualGCN model mainly comprises four modules: 1) SynGCN: instead of explicitly encoding syntactic structure, the SynGCN module uses the dependency probability matrix as a graph structure to implicitly integrate the syntactic information; 2) SemGCN: we design the SemGCN module with multihead attention to enhance the performance of the syntactic structure with the semantic information; 3) Regularizers: we propose orthogonal and differential regularizers to precisely capture semantic correlations between words by constraining attention scores in the SemGCN module; and 4) Mutual BiAffine: we use the BiAffine module to bridge relevant information between the SynGCN and SemGCN modules. Extensive experiments are conducted compared with up-to-date pretrained language encoders on two groups of datasets, one including Restaurant14, Laptop14, and Twitter and the other including Restaurant15 and Restaurant16. The experimental results demonstrate that the parsing results of various dependency parsers affect their performance of the GCN-based models. Our DualGCN model achieves superior performance compared with the state-of-the-art approaches. The source code and preprocessed datasets are provided and publicly available on GitHub (see https://github.com/CCChenhao997/DualGCN-ABSA).

11.
IEEE Trans Image Process ; 31: 4474-4489, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35763476

RESUMO

Text-based video segmentation aims to segment an actor in video sequences by specifying the actor and its performing action with a textual query. Previous methods fail to explicitly align the video content with the textual query in a fine-grained manner according to the actor and its action, due to the problem of semantic asymmetry. The semantic asymmetry implies that two modalities contain different amounts of semantic information during the multi-modal fusion process. To alleviate this problem, we propose a novel actor and action modular network that individually localizes the actor and its action in two separate modules. Specifically, we first learn the actor-/action-related content from the video and textual query, and then match them in a symmetrical manner to localize the target tube. The target tube contains the desired actor and action which is then fed into a fully convolutional network to predict segmentation masks of the actor. Our method also establishes the association of objects cross multiple frames with the proposed temporal proposal aggregation mechanism. This enables our method to segment the video effectively and keep the temporal consistency of predictions. The whole model is allowed for joint learning of the actor-action matching and segmentation, as well as achieves the state-of-the-art performance for both single-frame segmentation and full video segmentation on A2D Sentences and J-HMDB Sentences datasets.

12.
IEEE Trans Image Process ; 31: 4543-4555, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35767479

RESUMO

Metric-based methods achieve promising performance on few-shot classification by learning clusters on support samples and generating shared decision boundaries for query samples. However, existing methods ignore the inaccurate class center approximation introduced by the limited number of support samples, which consequently leads to biased inference. Therefore, in this paper, we propose to reduce the approximation error by class center calibration. Specifically, we introduce the so-called Pair-wise Similarity Module (PSM) to generate calibrated class centers adapted to the query sample by capturing the semantic correlations between the support and the query samples, as well as enhancing the discriminative regions on support representation. It is worth noting that the proposed PSM is a simple plug-and-play module and can be inserted into most metric-based few-shot learning models. Through extensive experiments in metric-based models, we demonstrate that the module significantly improves the performance of conventional few-shot classification methods on four few-shot image classification benchmark datasets. Codes are available at: https://github.com/PRIS-CV/Pair-wise-Similarity-module.

13.
IEEE Trans Image Process ; 31: 1447-1461, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35044912

RESUMO

In this paper, we aim to explore the fine-grained perception ability of deep models for the newly proposed scene sketch semantic segmentation task. Scene sketches are abstract drawings containing multiple related objects. It plays a vital role in daily communication and human-computer interaction. The study has only recently started due to a main obstacle of the absence of large-scale datasets. The currently available dataset SketchyScene is composed of clip art-style edge maps, which lacks abstractness and diversity. To drive further research, we contribute two new large-scale datasets based on real hand-drawn object sketches. A general automatic scene sketch synthesis process is developed to assist with new dataset composition. Furthermore, we propose to enhancing local detail perception in deep models to realize accurate stroke-oriented scene sketch segmentation. Due to the inherent differences between hand-drawn sketches and natural images, extreme low-level local features of strokes are incorporated to improve detail discrimination. Stroke masks are also integrated into model training to guide the learning attention. Extensive experiments are conducted on three large-scale scene sketch datasets. Our method achieves state-of-the-art performance under four evaluation metrics and yields meaningful interpretability via visual analytics.


Assuntos
Atenção , Semântica , Humanos , Percepção
14.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 4605-4625, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-34029187

RESUMO

Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets. We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods. Codes are available at https://github.com/PRIS-CV/AdvancedDropout.


Assuntos
Algoritmos , Redes Neurais de Computação , Teorema de Bayes
15.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9521-9535, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752385

RESUMO

Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary - the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a consistent block convolution that encourages the network to learn the category-consistent features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and demonstrate the proposed method consistently outperforms existing alternatives or delivers competitive results. Codes are available at https://github.com/PRIS-CV/PMG-V2.


Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de Máquina
16.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8230-8248, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34375278

RESUMO

Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this paper, we propose Gaussian process embedded channel attention (GPCA) module and further interpret the channel attention schemes in a probabilistic way. The GPCA module intends to model the correlations among the channels, which are assumed to be captured by beta distributed variables. As the beta distribution cannot be integrated into the end-to-end training of convolutional neural networks (CNNs) with a mathematically tractable solution, we utilize an approximation of the beta distribution to solve this problem. To specify, we adapt a Sigmoid-Gaussian approximation, in which the Gaussian distributed variables are transferred into the interval [0,1]. The Gaussian process is then utilized to model the correlations among different channels. In this case, a mathematically tractable solution is derived. The GPCA module can be efficiently implemented and integrated into the end-to-end training of the CNNs. Experimental results demonstrate the promising performance of the proposed GPCA module. Codes are available at https://github.com/PRIS-CV/GPCA.


Assuntos
Algoritmos , Redes Neurais de Computação , Distribuição Normal
17.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6089-6102, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34086578

RESUMO

A Bayesian nonparametric approach for estimation of a Dirichlet process (DP) mixture of generalized inverted Dirichlet distributions [i.e., an infinite generalized inverted Dirichlet mixture model (InGIDMM)] has been proposed. The generalized inverted Dirichlet distribution has been proven to be efficient in modeling the vectors that contain only positive elements. Under the classical variational inference (VI) framework, the key challenge in the Bayesian estimation of InGIDMM is that the expectation of the joint distribution of data and variables cannot be explicitly calculated. Therefore, numerical methods are usually applied to simulate the optimal posterior distributions. With the recently proposed extended VI (EVI) framework, we introduce lower bound approximations to the original variational objective function in the VI framework such that an analytically tractable solution can be derived. Hence, the problem in numerical simulation has been overcome. By applying the DP mixture technique, an InGIDMM can automatically determine the number of mixture components from the observed data. Moreover, the DP mixture model with an infinite number of mixture components also avoids the problems of underfitting and overfitting. The performance of the proposed approach is demonstrated with both synthesized data and real-life data applications.

18.
IEEE Trans Image Process ; 30: 9208-9219, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34739376

RESUMO

This paper proposes a dual-supervised uncertainty inference (DS-UI) framework for improving Bayesian estimation-based UI in DNN-based image recognition. In the DS-UI, we combine the classifier of a DNN, i.e., the last fully-connected (FC) layer, with a mixture of Gaussian mixture models (MoGMM) to obtain an MoGMM-FC layer. Unlike existing UI methods for DNNs, which only calculate the means or modes of the DNN outputs' distributions, the proposed MoGMM-FC layer acts as a probabilistic interpreter for the features that are inputs of the classifier to directly calculate the probabilities of them for the DS-UI. In addition, we propose a dual-supervised stochastic gradient-based variational Bayes (DS-SGVB) algorithm for the MoGMM-FC layer optimization. Unlike conventional SGVB and optimization algorithms in other UI methods, the DS-SGVB not only models the samples in the specific class for each Gaussian mixture model (GMM) in the MoGMM, but also considers the negative samples from other classes for the GMM to reduce the intra-class distances and enlarge the inter-class margins simultaneously for enhancing the learning ability of the MoGMM-FC layer in the DS-UI. Experimental results show the DS-UI outperforms the state-of-the-art UI methods in misclassification detection. We further evaluate the DS-UI in open-set out-of-domain/-distribution detection and find statistically significant improvements. Visualizations of the feature spaces demonstrate the superiority of the DS-UI. Codes are available at https://github.com/PRIS-CV/DS-UI.

19.
IEEE Trans Image Process ; 30: 2826-2836, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33556008

RESUMO

Classifying the sub-categories of an object from the same super-category (e.g., bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information (e.g., color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https://github.com/PRIS-CV/AP-CNN_Pytorch-master.

20.
IEEE Trans Image Process ; 30: 1318-1331, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33315565

RESUMO

Few-shot learning for fine-grained image classification has gained recent attention in computer vision. Among the approaches for few-shot learning, due to the simplicity and effectiveness, metric-based methods are favorably state-of-the-art on many tasks. Most of the metric-based methods assume a single similarity measure and thus obtain a single feature space. However, if samples can simultaneously be well classified via two distinct similarity measures, the samples within a class can distribute more compactly in a smaller feature space, producing more discriminative feature maps. Motivated by this, we propose a so-called Bi-Similarity Network (BSNet) that consists of a single embedding module and a bi-similarity module of two similarity measures. After the support images and the query images pass through the convolution-based embedding module, the bi-similarity module learns feature maps according to two similarity measures of diverse characteristics. In this way, the model is enabled to learn more discriminative and less similarity-biased features from few shots of fine-grained images, such that the model generalization ability can be significantly improved. Through extensive experiments by slightly modifying established metric/similarity based networks, we show that the proposed approach produces a substantial improvement on several fine-grained image benchmark datasets. Codes are available at: https://github.com/PRIS-CV/BSNet.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA