Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38478433

RESUMO

The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting - a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. We introduce the snapshot ensemble method in the episodic learning strategy - a simple trick to further improve model performance without increasing training costs. Experimental results on three widely used fine-grained image classification datasets, as well as general and cross-domain few-shot image datasets, consistently show considerable improvements compared with other methods. Codes are available at https://github.com/PRIS-CV/BiEN.

2.
IEEE Trans Image Process ; 33: 2266-2278, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38470581

RESUMO

The problem of sketch semantic segmentation is far from being solved. Despite existing methods exhibiting near-saturating performances on simple sketches with high recognisability, they suffer serious setbacks when the target sketches are products of an imaginative process with high degree of creativity. We hypothesise that human creativity, being highly individualistic, induces a significant shift in distribution of sketches, leading to poor model generalisation. Such hypothesis, backed by empirical evidences, opens the door for a solution that explicitly disentangles creativity while learning sketch representations. We materialise this by crafting a learnable creativity estimator that assigns a scalar score of creativity to each sketch. It follows that we introduce CreativeSeg, a learning-to-learn framework that leverages the estimator in order to learn creativity-agnostic representation, and eventually the downstream semantic segmentation task. We empirically verify the superiority of CreativeSeg on the recent "Creative Birds" and "Creative Creatures" creative sketch datasets. Through a human study, we further strengthen the case that the learned creativity score does indeed have a positive correlation with the subjective creativity of human. Codes are available at https://github.com/PRIS-CV/Sketch-CS.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(5): 2658-2671, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-37801380

RESUMO

Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for. Semi-supervised learning (SSL) techniques, acquiring knowledge from unlabeled data, provide a considerable means forward and have shown great promise for coarse-grained problems. However, exiting SSL paradigms mostly assume in-category (i.e., category-aligned) unlabeled data, which hinders their effectiveness when re-proposed on FGVC. In this paper, we put forward a novel design specifically aimed at making out-of-category data work for semi-supervised FGVC. We work off an important assumption that all fine-grained categories naturally follow a hierarchical structure (e.g., the phylogenetic tree of "Aves" that covers all bird species). It follows that, instead of operating on individual samples, we can instead predict sample relations within this tree structure as the optimization goal of SSL. Beyond this, we further introduced two strategies uniquely brought by these tree structures to achieve inter-sample consistency regularization and reliable pseudo-relation. Our experimental results reveal that (i) the proposed method yields good robustness against out-of-category data, and (ii) it can be equipped with prior arts, boosting their performance thus yielding state-of-the-art results.

4.
IEEE Trans Image Process ; 32: 4595-4609, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37561619

RESUMO

Sketch is a well-researched topic in the vision community by now. Sketch semantic segmentation in particular, serves as a fundamental step towards finer-level sketch interpretation. Recent works use various means of extracting discriminative features from sketches and have achieved considerable improvements on segmentation accuracy. Common approaches for this include attending to the sketch-image as a whole, its stroke-level representation or the sequence information embedded in it. However, they mostly focus on only a part of such multi-facet information. In this paper, we for the first time demonstrate that there is complementary information to be explored across all these three facets of sketch data, and that segmentation performance consequently benefits as a result of such exploration of sketch-specific information. Specifically, we propose the Sketch-Segformer, a transformer-based framework for sketch semantic segmentation that inherently treats sketches as stroke sequences other than pixel-maps. In particular, Sketch-Segformer introduces two types of self-attention modules having similar structures that work with different receptive fields (i.e., whole sketch or individual stroke). The order embedding is then further synergized with spatial embeddings learned from the entire sketch as well as localized stroke-level information. Extensive experiments show that our sketch-specific design is not only able to obtain state-of-the-art performance on traditional figurative sketches (such as SPG, SketchSeg-150K datasets), but also performs well on creative sketches that do not conform to conventional object semantics (CreativeSketch dataset) thanks for our usage of multi-facet sketch information. Ablation studies, visualizations, and invariance tests further justifies our design choice and the effectiveness of Sketch-Segformer. Codes are available at https://github.com/PRIS-CV/Sketch-SF.

5.
IEEE Trans Image Process ; 32: 4664-4676, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37471189

RESUMO

Source-Free Domain Adaptation (SFDA) is becoming topical to address the challenge of distribution shift between training and deployment data, while also relaxing the requirement of source data availability during target domain adaptation. In this paper, we focus on SFDA for semantic segmentation, in which pseudo labeling based target domain self-training is a common solution. However, pseudo labels generated by the source models are particularly unreliable on the target domain data due to the domain shift issue. Therefore, we propose to use Bayesian Neural Network (BNN) to improve the target self-training by better estimating and exploiting pseudo-label uncertainty. With the uncertainty estimation of BNNs, we introduce two novel self-training based components: Uncertainty-aware Online Teacher-Student Learning (UOTSL) and Uncertainty-aware FeatureMix (UFM). Extensive experiments on two popular benchmarks, GTA 5 → Cityscapes and SYNTHIA → Cityscapes, show the superiority of our proposed method with mIoU gains of 3.6% and 5.7% over the state-of-the-art respectively.

6.
IEEE Trans Image Process ; 32: 3311-3323, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37279117

RESUMO

Generalized Few-shot Semantic Segmentation (GFSS) aims to segment each image pixel into either base classes with abundant training examples or novel classes with only a handful of (e. g., 1-5) training images per class. Compared to the widely studied Few-shot Semantic Segmentation (FSS), which is limited to segmenting novel classes only, GFSS is much under-studied despite being more practical. Existing approach to GFSS is based on classifier parameter fusion whereby a newly trained novel class classifier and a pre-trained base class classifier are combined to form a new classifier. As the training data is dominated by base classes, this approach is inevitably biased towards the base classes. In this work, we propose a novel Prediction Calibration Network (PCN) to address this problem. Instead of fusing the classifier parameters, we fuse the scores produced separately by the base and novel classifiers. To ensure that the fused scores are not biased to either the base or novel classes, a new Transformer-based calibration module is introduced. It is known that the lower-level features are useful of detecting edge information in an input image than higher-level features. Thus, we build a cross-attention module that guides the classifier's final prediction using the fused multi-level features. However, transformers are computationally demanding. Crucially, to make the proposed cross-attention module training tractable at the pixel level, this module is designed based on feature-score cross-covariance and episodically trained to be generalizable at inference time. Extensive experiments on PASCAL- 5i and COCO- 20i show that our PCN outperforms the state-the-the-art alternatives by large margins.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12068-12084, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37159309

RESUMO

As powerful as fine-grained visual classification (FGVC) is, responding your query with a bird name of "Whip-poor-will" or "Mallard" probably does not make much sense. This however commonly accepted in the literature, underlines a fundamental question interfacing AI and human - what constitutes transferable knowledge for human to learn from AI? This paper sets out to answer this very question using FGVC as a test bed. Specifically, we envisage a scenario where a trained FGVC model (the AI expert) functions as a knowledge provider in enabling average people (you and me) to become better domain experts ourselves. Assuming an AI expert trained using expert human labels, we anchor our focus on asking and providing solutions for two questions: (i) what is the best transferable knowledge we can extract from AI, and (ii) what is the most practical means to measure the gains in expertise given that knowledge? We propose to represent knowledge as highly discriminative visual regions that are expert-exclusive and instantiate it via a novel multi-stage learning framework. A human study of 15,000 trials shows our method is able to consistently improve people of divergent bird expertise to recognise once unrecognisable birds. We further propose a crude but benchmarkable metric TEMI and therefore allow future efforts in this direction to be comparable to ours without the need of large-scale human studies.


Assuntos
Algoritmos , Aves , Animais , Humanos
8.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 285-312, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35130149

RESUMO

Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.

9.
IEEE Trans Image Process ; 31: 4585-4597, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35776810

RESUMO

Most existing studies on unsupervised domain adaptation (UDA) assume that each domain's training samples come with domain labels (e.g., painting, photo). Samples from each domain are assumed to follow the same distribution and the domain labels are exploited to learn domain-invariant features via feature alignment. However, such an assumption often does not hold true-there often exist numerous finer-grained domains (e.g., dozens of modern painting styles have been developed, each differing dramatically from those of the classic styles). Therefore, forcing feature distribution alignment across each artificially-defined and coarse-grained domain can be ineffective. In this paper, we address both single-source and multi-source UDA from a completely different perspective, which is to view each instance as a fine domain. Feature alignment across domains is thus redundant. Instead, we propose to perform dynamic instance domain adaptation (DIDA). Concretely, a dynamic neural network with adaptive convolutional kernels is developed to generate instance-adaptive residuals to adapt domain-agnostic deep features to each individual instance. This enables a shared classifier to be applied to both source and target domain data without relying on any domain annotation. Further, instead of imposing intricate feature alignment losses, we adopt a simple semi-supervised learning paradigm using only a cross-entropy loss for both labeled source and pseudo labeled target data. Our model, dubbed DIDA-Net, achieves state-of-the-art performance on several commonly used single-source and multi-source UDA datasets including Digits, Office-Home, DomainNet, Digit-Five, and PACS.


Assuntos
Algoritmos , Redes Neurais de Computação
10.
IEEE Trans Image Process ; 31: 2673-2682, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35316186

RESUMO

We present the first one-shot personalized sketch segmentation method. We aim to segment all sketches belonging to the same category provisioned with a single sketch with a given part annotation while (i) preserving the parts semantics embedded in the exemplar, and (ii) being robust to input style and abstraction. We refer to this scenario as personalized. With that, we importantly enable a much-desired personalization capability for downstream fine-grained sketch analysis tasks. To train a robust segmentation module, we deform the exemplar sketch to each of the available sketches of the same category. Our method generalizes to sketches not observed during training. Our central contribution is a sketch-specific hierarchical deformation network. Given a multi-level sketch-strokes encoding obtained via a graph convolutional network, our method estimates rigid-body transformation from the target to the exemplar, on the upper level. Finer deformation from the exemplar to the globally warped target sketch is further obtained through stroke-wise deformations, on the lower-level. Both levels of deformation are guided by mean squared distances between the keypoints learned without supervision, ensuring that the stroke semantics are preserved. We evaluate our method against the state-of-the-art segmentation and perceptual grouping baselines re-purposed for the one-shot setting and against two few-shot 3D shape segmentation methods. We show that our method outperforms all the alternatives by more than 10% on average. Ablation studies further demonstrate that our method is robust to personalization: changes in input part semantics and style differences.

11.
IEEE Trans Image Process ; 31: 1447-1461, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35044912

RESUMO

In this paper, we aim to explore the fine-grained perception ability of deep models for the newly proposed scene sketch semantic segmentation task. Scene sketches are abstract drawings containing multiple related objects. It plays a vital role in daily communication and human-computer interaction. The study has only recently started due to a main obstacle of the absence of large-scale datasets. The currently available dataset SketchyScene is composed of clip art-style edge maps, which lacks abstractness and diversity. To drive further research, we contribute two new large-scale datasets based on real hand-drawn object sketches. A general automatic scene sketch synthesis process is developed to assist with new dataset composition. Furthermore, we propose to enhancing local detail perception in deep models to realize accurate stroke-oriented scene sketch segmentation. Due to the inherent differences between hand-drawn sketches and natural images, extreme low-level local features of strokes are incorporated to improve detail discrimination. Stroke masks are also integrated into model training to guide the learning attention. Extensive experiments are conducted on three large-scale scene sketch datasets. Our method achieves state-of-the-art performance under four evaluation metrics and yields meaningful interpretability via visual analytics.


Assuntos
Atenção , Semântica , Humanos , Percepção
12.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 8927-8948, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752384

RESUMO

Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas - fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Animais , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aves
13.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9521-9535, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752385

RESUMO

Fine-grained visual classification (FGVC) is much more challenging than traditional classification tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary - the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a consistent block convolution that encourages the network to learn the category-consistent features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and demonstrate the proposed method consistently outperforms existing alternatives or delivers competitive results. Codes are available at https://github.com/PRIS-CV/PMG-V2.


Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de Máquina
14.
IEEE Trans Image Process ; 30: 8595-8606, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34648442

RESUMO

In this paper we study, for the first time, the problem of fine-grained sketch-based 3D shape retrieval. We advocate the use of sketches as a fine-grained input modality to retrieve 3D shapes at instance-level - e.g., given a sketch of a chair, we set out to retrieve a specific chair from a gallery of all chairs. Fine-grained sketch-based 3D shape retrieval (FG-SBSR) has not been possible till now due to a lack of datasets that exhibit one-to-one sketch-3D correspondences. The first key contribution of this paper is two new datasets, consisting a total of 4,680 sketch-3D pairings from two object categories. Even with the datasets, FG-SBSR is still highly challenging because (i) the inherent domain gap between 2D sketch and 3D shape is large, and (ii) retrieval needs to be conducted at the instance level instead of the coarse category level matching as in traditional SBSR. Thus, the second contribution of the paper is the first cross-modal deep embedding model for FG-SBSR, which specifically tackles the unique challenges presented by this new problem. Core to the deep embedding model is a novel cross-modal view attention module which automatically computes the optimal combination of 2D projections of a 3D shape given a query sketch.

15.
Artigo em Inglês | MEDLINE | ID: mdl-33026989

RESUMO

Given pixel-level annotated data, traditional photo segmentation techniques have achieved promising results. However, these photo segmentation models can only identify objects in categories for which data annotation and training have been carried out. This limitation has inspired recent work on few-shot and zero-shot learning for image segmentation. In this paper, we show the value of sketch for photo segmentation, in particular as a transferable representation to describe a concept to be segmented. We show, for the first time, that it is possible to generate a photo-segmentation model of a novel category using just a single sketch and furthermore exploit the unique fine-grained characteristics of sketch to produce more detailed segmentation. More specifically, we propose a sketch-based photo segmentation method that takes sketch as input and synthesizes the weights required for a neural network to segment the corresponding region of a given photo. Our framework can be applied at both the category-level and the instance-level, and fine-grained input sketches provide more accurate segmentation in the latter. This framework generalizes across categories via sketch and thus provides an alternative to zero-shot learning when segmenting a photo from a category without annotated training data. To investigate the instance-level relationship across sketch and photo, we create the SketchySeg dataset which contains segmentation annotations for photos corresponding to paired sketches in the Sketchy Dataset.

16.
Artigo em Inglês | MEDLINE | ID: mdl-32092002

RESUMO

The key to solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show that it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms - a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive across the spatial dimension. The end result is therefore a set of feature channels, each of which reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford Cars). Ablative studies further demonstrate the superiority of the MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks.

17.
IEEE Trans Image Process ; 28(7): 3219-3231, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30703021

RESUMO

Human free-hand sketches provide the useful data for studying human perceptual grouping, where the grouping principles such as the Gestalt laws of grouping are naturally in play during both the perception and sketching stages. In this paper, we make the first attempt to develop a universal sketch perceptual grouper. That is, a grouper that can be applied to sketches of any category created with any drawing style and ability, to group constituent strokes/segments into semantically meaningful object parts. The first obstacle to achieving this goal is the lack of large-scale datasets with grouping annotation. To overcome this, we contribute the largest sketch perceptual grouping dataset to date, consisting of 20 000 unique sketches evenly distributed over 25 object categories. Furthermore, we propose a novel deep perceptual grouping model learned with both generative and discriminative losses. The generative loss improves the generalization ability of the model, while the discriminative loss guarantees both local and global grouping consistency. Extensive experiments demonstrate that the proposed grouper significantly outperforms the state-of-the-art competitors. In addition, we show that our grouper is useful for a number of sketch analysis tasks, including sketch semantic segmentation, synthesis, and fine-grained sketch-based image retrieval.

18.
IEEE Trans Neural Netw Learn Syst ; 30(2): 449-463, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29994731

RESUMO

In this paper, we develop a novel variational Bayesian learning method for the Dirichlet process (DP) mixture of the inverted Dirichlet distributions, which has been shown to be very flexible for modeling vectors with positive elements. The recently proposed extended variational inference (EVI) framework is adopted to derive an analytically tractable solution. The convergency of the proposed algorithm is theoretically guaranteed by introducing single lower bound approximation to the original objective function in the EVI framework. In principle, the proposed model can be viewed as an infinite inverted Dirichlet mixture model that allows the automatic determination of the number of mixture components from data. Therefore, the problem of predetermining the optimal number of mixing components has been overcome. Moreover, the problems of overfitting and underfitting are avoided by the Bayesian estimation approach. Compared with several recently proposed DP-related methods and conventional applied methods, the good performance and effectiveness of the proposed method have been demonstrated with both synthesized data and real data evaluations.

19.
IEEE Trans Vis Comput Graph ; 19(8): 1252-63, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23744256

RESUMO

This paper shows that classifying shapes is a tool useful in nonphotorealistic rendering (NPR) from photographs. Our classifier inputs regions from an image segmentation hierarchy and outputs the "best" fitting simple shape such as a circle, square, or triangle. Other approaches to NPR have recognized the benefits of segmentation, but none have classified the shape of segments. By doing so, we can create artwork of a more abstract nature, emulating the style of modern artists such as Matisse and other artists who favored shape simplification in their artwork. The classifier chooses the shape that "best" represents the region. Since the classifier is trained by a user, the "best shape" has a subjective quality that can over-ride measurements such as minimum error and more importantly captures user preferences. Once trained, the system is fully automatic, although simple user interaction is also possible to allow for differences in individual tastes. A gallery of results shows how this classifier contributes to NPR from images by producing abstract artwork.


Assuntos
Arte , Processamento de Imagem Assistida por Computador/métodos , Fotografação , Algoritmos , Humanos
20.
IEEE Trans Image Process ; 20(4): 935-47, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20959268

RESUMO

Finding meaningful groupings of image primitives has been a long-standing problem in computer vision. This paper studies how salient groupings can be produced using established theories in the field of visual perception alone. The major contribution is a novel definition of the Gestalt principle of Prägnanz, based upon Koffka's definition that image descriptions should be both stable and simple. Our method is global in the sense that it operates over all primitives in an image at once. It works regardless of the type of image primitives and is generally independent of image properties such as intensity, color, and texture. A novel experiment is designed to quantitatively evaluate the groupings outputs by our method, which takes human disagreement into account and is generic to outputs of any grouper. We also demonstrate the value of our method in an image segmentation application and quantitatively show that segmentations deliver promising results when benchmarked using the Berkeley Segmentation Dataset (BSDS).


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA