RESUMO
Despite great strides made on fine-grained visual classification (FGVC), current methods are still heavily reliant on fully-supervised paradigms where ample expert labels are called for. Semi-supervised learning (SSL) techniques, acquiring knowledge from unlabeled data, provide a considerable means forward and have shown great promise for coarse-grained problems. However, exiting SSL paradigms mostly assume in-category (i.e., category-aligned) unlabeled data, which hinders their effectiveness when re-proposed on FGVC. In this paper, we put forward a novel design specifically aimed at making out-of-category data work for semi-supervised FGVC. We work off an important assumption that all fine-grained categories naturally follow a hierarchical structure (e.g., the phylogenetic tree of "Aves" that covers all bird species). It follows that, instead of operating on individual samples, we can instead predict sample relations within this tree structure as the optimization goal of SSL. Beyond this, we further introduced two strategies uniquely brought by these tree structures to achieve inter-sample consistency regularization and reliable pseudo-relation. Our experimental results reveal that (i) the proposed method yields good robustness against out-of-category data, and (ii) it can be equipped with prior arts, boosting their performance thus yielding state-of-the-art results.
RESUMO
The problem of sketch semantic segmentation is far from being solved. Despite existing methods exhibiting near-saturating performances on simple sketches with high recognisability, they suffer serious setbacks when the target sketches are products of an imaginative process with high degree of creativity. We hypothesise that human creativity, being highly individualistic, induces a significant shift in distribution of sketches, leading to poor model generalisation. Such hypothesis, backed by empirical evidences, opens the door for a solution that explicitly disentangles creativity while learning sketch representations. We materialise this by crafting a learnable creativity estimator that assigns a scalar score of creativity to each sketch. It follows that we introduce CreativeSeg, a learning-to-learn framework that leverages the estimator in order to learn creativity-agnostic representation, and eventually the downstream semantic segmentation task. We empirically verify the superiority of CreativeSeg on the recent "Creative Birds" and "Creative Creatures" creative sketch datasets. Through a human study, we further strengthen the case that the learned creativity score does indeed have a positive correlation with the subjective creativity of human. Codes are available at https://github.com/PRIS-CV/Sketch-CS.
RESUMO
The main challenge for fine-grained few-shot image classification is to learn feature representations with higher inter-class and lower intra-class variations, with a mere few labelled samples. Conventional few-shot learning methods however cannot be naively adopted for this fine-grained setting - a quick pilot study reveals that they in fact push for the opposite (i.e., lower inter-class variations and higher intra-class variations). To alleviate this problem, prior works predominately use a support set to reconstruct the query image and then utilize metric learning to determine its category. Upon careful inspection, we further reveal that such unidirectional reconstruction methods only help to increase inter-class variations and are not effective in tackling intra-class variations. In this paper, we introduce a bi-reconstruction mechanism that can simultaneously accommodate for inter-class and intra-class variations. In addition to using the support set to reconstruct the query set for increasing inter-class variations, we further use the query set to reconstruct the support set for reducing intra-class variations. This design effectively helps the model to explore more subtle and discriminative features which is key for the fine-grained problem in hand. Furthermore, we also construct a self-reconstruction module to work alongside the bi-directional module to make the features even more discriminative. We introduce the snapshot ensemble method in the episodic learning strategy - a simple trick to further improve model performance without increasing training costs. Experimental results on three widely used fine-grained image classification datasets, as well as general and cross-domain few-shot image datasets, consistently show considerable improvements compared with other methods.
RESUMO
As powerful as fine-grained visual classification (FGVC) is, responding your query with a bird name of "Whip-poor-will" or "Mallard" probably does not make much sense. This however commonly accepted in the literature, underlines a fundamental question interfacing AI and human - what constitutes transferable knowledge for human to learn from AI? This paper sets out to answer this very question using FGVC as a test bed. Specifically, we envisage a scenario where a trained FGVC model (the AI expert) functions as a knowledge provider in enabling average people (you and me) to become better domain experts ourselves. Assuming an AI expert trained using expert human labels, we anchor our focus on asking and providing solutions for two questions: (i) what is the best transferable knowledge we can extract from AI, and (ii) what is the most practical means to measure the gains in expertise given that knowledge? We propose to represent knowledge as highly discriminative visual regions that are expert-exclusive and instantiate it via a novel multi-stage learning framework. A human study of 15,000 trials shows our method is able to consistently improve people of divergent bird expertise to recognise once unrecognisable birds. We further propose a crude but benchmarkable metric TEMI and therefore allow future efforts in this direction to be comparable to ours without the need of large-scale human studies.
Assuntos
Algoritmos , Aves , Animais , HumanosRESUMO
Channel attention mechanisms have been commonly applied in many visual tasks for effective performance improvement. It is able to reinforce the informative channels as well as to suppress the useless channels. Recently, different channel attention modules have been proposed and implemented in various ways. Generally speaking, they are mainly based on convolution and pooling operations. In this paper, we propose Gaussian process embedded channel attention (GPCA) module and further interpret the channel attention schemes in a probabilistic way. The GPCA module intends to model the correlations among the channels, which are assumed to be captured by beta distributed variables. As the beta distribution cannot be integrated into the end-to-end training of convolutional neural networks (CNNs) with a mathematically tractable solution, we utilize an approximation of the beta distribution to solve this problem. To specify, we adapt a Sigmoid-Gaussian approximation, in which the Gaussian distributed variables are transferred into the interval [0,1]. The Gaussian process is then utilized to model the correlations among different channels. In this case, a mathematically tractable solution is derived. The GPCA module can be efficiently implemented and integrated into the end-to-end training of the CNNs. Experimental results demonstrate the promising performance of the proposed GPCA module. Codes are available at https://github.com/PRIS-CV/GPCA.
Assuntos
Algoritmos , Redes Neurais de Computação , Distribuição NormalRESUMO
Fine-grained visual classiï¬cation (FGVC) is much more challenging than traditional classiï¬cation tasks due to the inherently subtle intra-class object variations. Recent works are mainly part-driven (either explicitly or implicitly), with the assumption that fine-grained information naturally rests within the parts. In this paper, we take a different stance, and show that part operations are not strictly necessary - the key lies with encouraging the network to learn at different granularities and progressively fusing multi-granularity features together. In particular, we propose: (i) a progressive training strategy that effectively fuses features from different granularities, and (ii) a consistent block convolution that encourages the network to learn the category-consistent features at specific granularities. We evaluate on several standard FGVC benchmark datasets, and demonstrate the proposed method consistently outperforms existing alternatives or delivers competitive results. Codes are available at https://github.com/PRIS-CV/PMG-V2.
Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de MáquinaRESUMO
Classifying the sub-categories of an object from the same super-category (e.g., bird species and cars) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this article, by contrast, we show that by integrating low-level information (e.g., color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of 1) a dual pathway hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, hence learning both high-level semantic and low-level detailed feature representation, and 2) an ROI-guided refinement strategy with ROI-guided dropblock and ROI-guided zoom-in operation, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of any additional bounding box/part annotation. Extensive experiments on three popularly tested FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach achieves state-of-the-art performance. Models and code are available at https://github.com/PRIS-CV/AP-CNN_Pytorch-master.
RESUMO
The key to solving fine-grained image categorization is finding discriminate and local regions that correspond to subtle visual traits. Great strides have been made, with complex networks designed specifically to learn part-level discriminate feature representations. In this paper, we show that it is possible to cultivate subtle details without the need for overly complicated network designs or training mechanisms - a single loss is all it takes. The main trick lies with how we delve into individual feature channels early on, as opposed to the convention of starting from a consolidated feature map. The proposed loss function, termed as mutual-channel loss (MC-Loss), consists of two channel-specific components: a discriminality component and a diversity component. The discriminality component forces all feature channels belonging to the same class to be discriminative, through a novel channel-wise attention mechanism. The diversity component additionally constraints channels so that they become mutually exclusive across the spatial dimension. The end result is therefore a set of feature channels, each of which reflects different locally discriminative regions for a specific class. The MC-Loss can be trained end-to-end, without the need for any bounding-box/part annotations, and yields highly discriminative regions during inference. Experimental results show our MC-Loss when implemented on top of common base networks can achieve state-of-the-art performance on all four fine-grained categorization datasets (CUB-Birds, FGVC-Aircraft, Flowers-102, and Stanford Cars). Ablative studies further demonstrate the superiority of the MC-Loss when compared with other recently proposed general-purpose losses for visual classification, on two different base networks.
RESUMO
A deep neural network of multiple nonlinear layers forms a large function space, which can easily lead to overfitting when it encounters small-sample data. To mitigate overfitting in small-sample classification, learning more discriminative features from small-sample data is becoming a new trend. To this end, this paper aims to find a subspace of neural networks that can facilitate a large decision margin. Specifically, we propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain orthogonal during both the training and test processes. The Rademacher complexity of a network using the OSL is only 1/K, where K is the number of classes, of that of a network using the fully connected classification layer, leading to a tighter generalization error bound. Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets, as well as its applicability to large-sample datasets. Codes are available at: https://github.com/dongliangchang/OSLNet.
RESUMO
[reaction: see text] Hydroxylation of N-substituted azetidines 11 and 12 and piperidines 15-19 with Sphingomonas sp. HXN-200 gave 91-98% of the corresponding 3-hydroxyazetidines 13 and 14 and 4-hydroxypiperidines 20-24, respectively, with high activity and excellent regioselectivity. High yields and high product concentrations (2 g/L) were achieved with frozen/thawed cells as biocatalyst. For the first time, rehydrated lyophilized cells were successfully used for the biohydroxylation.
Assuntos
Azetidinas/química , Galactose Oxidase/química , Piperidinas/química , Aldeídos/síntese química , Sequência de Carboidratos , Cromatografia Líquida de Alta Pressão , Hidroxilaminas/química , Espectroscopia de Ressonância Magnética , Dados de Sequência Molecular , Peso Molecular , Oximas/síntese química , Polímeros/síntese químicaRESUMO
Hydrolysis of N-benzyloxycarbonyl-3,4-epoxy-pyrrolidine and cyclohexene oxide with the epoxide hydrolase of Sphingomonas sp. HXN-200, respectively, gave the corresponding vicinal trans-diols in high ee and yield, representing the first example of enantioselective hydrolysis of a meso-epoxide with a bacterial epoxide hydrolase.
Assuntos
Álcoois/síntese química , Epóxido Hidrolases/química , Compostos de Epóxi/química , Hidrocarbonetos Alicíclicos/química , Sphingomonas/enzimologia , Cicloexanos/química , Cicloexenos , Hidrólise , Pirrolidinas/química , EstereoisomerismoRESUMO
Discovery of new bacterial strains with fast identification in a miniaturized system was performed for the synthesis of optically active L-tert-butyl leucine. With tert-butyl leucine amide as nitrogen source, one bacterial strain with high conversion and high enantioselectivity was discovered among 120 isolated microorganisms from local soils and identified as Mycobacterium sp. JX009. Glucose and ammonium chloride were examined as the good carbon source and nitrogen source for the cells' growth separately. The cells grew better at 30 °C and at pH 7.5 with higher activity of 2,650 U/l in comparison with other conditions. Cells' stability was improved by immobilization on synthetic resin 0730 without pretreatment. Tert-butyl leucine amide (30 mM) was successfully hydrolyzed by immobilized cells and examined as the highest chemical concentration that cells could endure. After six reaction cycles, the immobilized cells retained 90% activity with production of L-tert-butyl leucine in 98% ee. The results firstly reported the application of new bacterial strain in the hydrolysis of tert-butyl leucine amide to produce optically active L-tert-butyl leucine in an efficient way with investigation in detail.
Assuntos
Leucina/análogos & derivados , Leucina/metabolismo , Cromatografia Líquida de Alta Pressão , Cromatografia em Camada Fina , Concentração de Íons de Hidrogênio , Leucina/química , Mycobacterium/metabolismo , Estereoisomerismo , TemperaturaRESUMO
The bacterial strain Sphingomonas sp. HXN-200 was used to catalyze the trans dihydroxylation ofN-substituted 1,2,5,6-tetrahydropyridines 1 and 3-pyrrolines 4 giving the corresponding 3,4-dihydroxypiperidines 3 and 3,4-dihydroxypyrrolidines 6, respectively, with high enantioselectivity and high activity. The trans dihydroxylation was sequentially catalyzed by a monooxygenase and an epoxide hydrolase in the strain with epoxide as intermediate. While both epoxidation and hydrolysis steps contributed to the overall enantioselectivity in trans dihydroxylation of 1, the enantioselectivity in trans dihydroxylation of the symmetric substrate 4 was generated only in the hydrolysis of meso-epoxide 5. The absolute configuration for the bioproducts (+)-3 and (+)-6 was established as (3R,4R) by chemical correlations. Preparative trans dihydroxylation of 1a and 4b with frozen/thawed cells of Sphingomonas sp. HXN-200 afforded the corresponding (+)-(3R,4R)-3,4-dihydroxypiperidine 3a and (+)-(3R,4R)-3,4-dihydroxy pyrrolidine 6b in 96% ee both and in 60% and 80% yield, respectively. These results represent first examples of enantioselective trans dihydroxylation with nonterpene substrates and with bacterial catalyst, thus significantly extending this methodology in practical synthesis of valuable and useful trans diols. Enantioselective hydrolysis of racemic epoxide 2a with Sphingomonas sp. HXN-200 gave 34% of (-)-2a in >99% ee, which is a versatile chiral building block. Further hydrolysis of (-)-2a with the same strain afforded (-)-(3S,4S)-3a in 96% ee and 92% yield. Thus, both enantiomers of 3a can be prepared by biotransformation with Sphingomonas sp. HXN-200.