Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 33: 3456-3469, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38787666

RESUMO

Our work focuses on tackling the problem of fine-grained recognition with incomplete multi-modal data, which is overlooked by previous work in the literature. It is desirable to not only capture fine-grained patterns of objects but also alleviate the challenges of missing modalities for such a practical problem. In this paper, we propose to leverage a meta-learning strategy to learn model abilities of both fast modal adaptation and more importantly missing modality completion across a variety of incomplete multi-modality learning tasks. Based on that, we develop a meta-completion method, termed as MECOM, to perform multimodal fusion and explicit missing modality completion by our proposals of cross-modal attention and decoupling reconstruction. To further improve fine-grained recognition accuracy, an additional partial stream (as a counterpart of the main stream of MECOM, i.e., holistic) and the part-level features (corresponding to fine-grained objects' parts) selection are designed, which are tailored for fine-grained nature to capture discriminative but subtle part-level patterns. Comprehensive experiments from quantitative and qualitative aspects, as well as various ablation studies, on two fine-grained multimodal datasets and one generic multimodal dataset show our superiority over competing methods. Our code is open-source and available at https://github.com/SEU-VIPGroup/MECOM.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2091-2103, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37971914

RESUMO

Semi-Supervised Few-Shot Learning (SSFSL) aims to train a classifier that can adapt to new tasks using limited labeled data and a fixed amount of unlabeled data. Various sophisticated methods have been proposed to tackle the challenges associated with this problem. In this paper, we present a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective. We leverage these pseudo-labels to augment the support set, which is typically limited in few-shot tasks, e.g., 1-shot classification. In such label-constrained scenarios, our approach can offer highly accurate negative pseudo-labels. By iteratively excluding negative pseudo-labels one by one, we ultimately derive a positive pseudo-label for each unlabeled sample in our approach. The integration of negative and positive pseudo-labels complements the limited support set, resulting in significant accuracy improvements for SSFSL. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it outperforms state-of-the-art methods on four benchmark datasets. Furthermore, our approach exhibits good adaptability and generalization capabilities when used as a plug-and-play counterpart alongside existing SSFSL methods and when extended to generalized linear models.

3.
ACS Nano ; 17(18): 18055-18061, 2023 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-37498772

RESUMO

This study demonstrates the implementation of the Hamming code using DNA-based nanostructures for error detection and correction in communication systems. The designed DNA nanostructures conduct logical operations to compute check codes and identify and correct erroneous data based on fluorescence signals. The execution of intricate DNA logic operations requires individuals with specialized training. By interpretation of the fluorescence signals generated by the DNA nanostructures, binary language can be extracted, effectively protecting data security. The findings highlight the potential of DNA as a versatile platform for reliable data transmission.


Assuntos
Computadores Moleculares , Nanoestruturas , Humanos , DNA/química , Nanoestruturas/química , Lógica , Comunicação
4.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13904-13920, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37505998

RESUMO

Our work focuses on tackling large-scale fine-grained image retrieval as ranking the images depicting the concept of interests (i.e., the same sub-category labels) highest based on the fine-grained details in the query. It is desirable to alleviate the challenges of both fine-grained nature of small inter-class variations with large intra-class variations and explosive growth of fine-grained data for such a practical task. In this paper, we propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes to not only make the retrieval process efficient, but also establish explicit correspondences between hash codes and visual attributes. Specifically, based on the captured visual representations by attention, we develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors from the appearance-specific visual representations without attribute annotations. Our models are also equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities. Then, driven by preserving original entities' similarity, the required hash codes can be generated from these attribute-specific vectors and thus become attribute-aware. Furthermore, to combat simplicity bias in deep hashing, we consider the model design from the perspective of the self-consistency principle and propose to further enhance models' self-consistency by equipping an additional image reconstruction path. Comprehensive quantitative experiments under diverse empirical settings on six fine-grained retrieval datasets and two generic retrieval datasets show the superiority of our models over competing methods. Moreover, qualitative results demonstrate that not only the obtained hash codes can strongly correspond to certain kinds of crucial properties of fine-grained objects, but also our self-consistency designs can effectively overcome simplicity bias in fine-grained hashing.

5.
IEEE Trans Neural Netw Learn Syst ; 34(6): 3058-3070, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34570711

RESUMO

Object detection requires plentiful data annotated with bounding boxes for model training. However, in many applications, it is difficult or even impossible to acquire a large set of labeled examples for the target task due to the privacy concern or lack of reliable annotators. On the other hand, due to the high-quality image search engines, such as Flickr and Google, it is relatively easy to obtain resource-rich unlabeled datasets, whose categories are a superset of those of target data. In this article, to improve the target model with cost-effective supervision from source data, we propose a partial transfer learning approach QBox to actively query labels for bounding boxes of source images. Specifically, we design two criteria, i.e., informativeness and transferability, to measure the potential utility of a bounding box for improving the target model. Based on these criteria, QBox actively queries the labels of the most useful boxes from the source domain and, thus, requires fewer training examples to save the labeling cost. Furthermore, the proposed query strategy allows annotators to simply labeling a specific region, instead of the whole image, and, thus, significantly reduces the labeling difficulty. Extensive experiments are performed on various partial transfer benchmarks and a real COVID-19 detection task. The results validate that QBox improves the detection accuracy with lower labeling cost compared to state-of-the-art query strategies for object detection.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6969-6983, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33656987

RESUMO

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important information, we propose graph convolutional networks (GCNs) based models for multi-label image recognition, where directed graphs are constructed over classes and information is propagated between classes to learn inter-dependent class-level representations. Following this idea, we design two particular models that approach multi-label classification from different views. In our first model, the prior knowledge about the class dependencies is integrated into classifier learning. Specifically, we propose Classifier Learning GCN (C-GCN) to map class-level semantic representations (e.g., word embeddings) into classifiers that maintain the inter-class topology. In our second model, we decompose the visual representation of an image into a set of label-aware features and propose prediction learning GCN (P-GCN) to encode such features into inter-dependent image-level prediction scores. Furthermore, we also present an effective correlation matrix construction approach to capture inter-class relationships and consequently guide information propagation among classes. Empirical results on generic multi-label image recognition demonstrate that both of the proposed models can obviously outperform other existing state-of-the-arts. Moreover, the proposed methods also show advantages in some other multi-label classification related applications.

7.
IEEE Trans Image Process ; 31: 3004-3016, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35380962

RESUMO

The practical task of Automatic Check-Out (ACO) is to accurately predict the presence and count of each product in an arbitrary product combination. Beyond the large-scale and the fine-grained nature of product categories as its main challenges, products are always continuously updated in realistic check-out scenarios, which is also required to be solved in an ACO system. Previous work in this research line almost depends on the supervisions of labor-intensive bounding boxes of products by performing a detection paradigm. While, in this paper, we propose a Self-Supervised Multi-Category Counting (S2MC2) network to leverage the point-level supervisions of products in check-out images to both lower the labeling cost and be able to return ACO predictions in a class incremental setting. Specifically, as a backbone, our S2MC2 is built upon a counting module in a class-agnostic counting fashion. Also, it consists of several crucial components including an attention module for capturing fine-grained patterns and a domain adaptation module for reducing the domain gap between single product images as training and check-out images as test. Furthermore, a self-supervised approach is utilized in S2MC2 to initialize the parameters of its backbone for better performance. By conducting comprehensive experiments on the large-scale automatic check-out dataset RPC, we demonstrate that our proposed S2MC2 achieves superior accuracy in both traditional and incremental settings of ACO tasks over the competing baselines.

8.
Anal Chim Acta ; 1192: 339343, 2022 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-35057934

RESUMO

The fluorescent properties of conjugated microporous polyphenylene (CMPs) were tuned through a wide range by inclusion of small amount of comonomer as chromophore in the network. The multi-color CMPs were used for explosives sensing and demonstrated broad sensitivity (ranging from -0.01888 µM-1 to -0.00467 µM-1) and LODs (ranging from 31.0 nM to 125.3 nM) against thirteen explosive compounds including nitroaromatics (NACs), nitramines (NAMs) and nitrogen-rich heterocycles (NRHCs). The CMPs were also developed as a sensor array for discrimination of thirteen explosives, specifically including NT, p-DNB, DNT, TNT, TNP, TNR, RDX, HMX, CL-20, FOX-7, NTO, DABT and DHT. By using classical statistical method "Linear Discriminant Analysis (LDA)", the thirteen explosives at a fixed concentration were completely discriminated and unknown test samples were indentied with 88% classification accuracy. Moreover, explosives in different concentrations and the mixtures of explosives were also successfully classified. Compared with LDA, Machine Learning algorithms have significant advantages in analyzing the array-based sensing data. Different Machine Learning models for pattern recognition have also been implemented and discussed here and much higher accuracy (96% for "neural network") can be achieved in predicting unknown test samples after training.


Assuntos
Substâncias Explosivas , Corantes , Limite de Detecção , Aprendizado de Máquina , Polímeros
9.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 8927-8948, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34752384

RESUMO

Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer vision and pattern recognition, and underpins a diverse set of real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, e.g., species of birds or models of cars. The small inter-class and large intra-class variation inherent to fine-grained image analysis makes it a challenging problem. Capitalizing on advances in deep learning, in recent years we have witnessed remarkable progress in deep learning powered FGIA. In this paper we present a systematic survey of these advances, where we attempt to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas - fine-grained image recognition and fine-grained image retrieval. In addition, we also review other key issues of FGIA, such as publicly available benchmark datasets and related domain-specific applications. We conclude by highlighting several research directions and open problems which need further exploration from the community.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Animais , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aves
10.
IEEE Trans Neural Netw Learn Syst ; 33(2): 866-878, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33180736

RESUMO

In this article, we present a novel lightweight path for deep residual neural networks. The proposed method integrates a simple plug-and-play module, i.e., a convolutional encoder-decoder (ED), as an augmented path to the original residual building block. Due to the abstract design and ability of the encoding stage, the decoder part tends to generate feature maps where highly semantically relevant responses are activated, while irrelevant responses are restrained. By a simple elementwise addition operation, the learned representations derived from the identity shortcut and original transformation branch are enhanced by our ED path. Furthermore, we exploit lightweight counterparts by removing a portion of channels in the original transformation branch. Fortunately, our lightweight processing does not cause an obvious performance drop but brings a computational economy. By conducting comprehensive experiments on ImageNet, MS-COCO, CUB200-2011, and CIFAR, we demonstrate the consistent accuracy gain obtained by our ED path for various residual architectures, with comparable or even lower model complexity. Concretely, it decreases the top-1 error of ResNet-50 and ResNet-101 by 1.22% and 0.91% on the task of ImageNet classification and increases the mmAP of Faster R-CNN with ResNet-101 by 2.5% on the MS-COCO object detection task. The code is available at https://github.com/Megvii-Nanjing/ED-Net.

11.
IEEE Trans Pattern Anal Mach Intell ; 42(7): 1654-1669, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-30835211

RESUMO

Landmark/pose estimation in single monocular images has received much effort in computer vision due to its important applications. It remains a challenging task when input images come with severe occlusions caused by, e.g., adverse camera views. Under such circumstances, biologically implausible pose predictions may be produced. In contrast, human vision is able to predict poses by exploiting geometric constraints of landmark point inter-connectivity. To address the problem, by incorporating priors about the structure of pose components, we propose a novel structure-aware fully convolutional network to implicitly take such priors into account during training of the deep network. Explicit learning of such constraints is typically challenging. Instead, inspired by how human identifies implausible poses, we design discriminators to distinguish the real poses from the fake ones (such as biologically implausible ones). If the pose generator G generates results that the discriminator fails to distinguish from real ones, the network successfully learns the priors. Training of the network follows the strategy of conditional Generative Adversarial Networks (GANs). The effectiveness of the proposed network is evaluated on three pose-related tasks: 2D human pose estimation, 2D facial landmark estimation and 3D human pose estimation. The proposed approach significantly outperforms several state-of-the-art methods and almost always generates plausible pose predictions, demonstrating the usefulness of implicit learning of structures using GANs.

12.
IEEE Trans Image Process ; 28(12): 6116-6125, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31265400

RESUMO

Humans are capable of learning a new fine-grained concept with very little supervision, e.g., few exemplary images for a species of bird, yet our best deep learning systems need hundreds or thousands of labeled examples. In this paper, we try to reduce this gap by studying the fine-grained image recognition problem in a challenging few-shot learning setting, termed few-shot fine-grained recognition (FSFG). The task of FSFG requires the learning systems to build classifiers for the novel fine-grained categories from few examples (only one or less than five). To solve this problem, we propose an end-to-end trainable deep network, which is inspired by the state-of-the-art fine-grained recognition model and is tailored for the FSFG task. Specifically, our network consists of a bilinear feature learning module and a classifier mapping module: while the former encodes the discriminative information of an exemplar image into a feature vector, the latter maps the intermediate feature into the decision boundary of the novel category. The key novelty of our model is a "piecewise mappings" function in the classifier mapping module, which generates the decision boundary via learning a set of more attainable sub-classifiers in a more parameter-economic way. We learn the exemplar-to-classifier mapping based on an auxiliary dataset in a meta-learning fashion, which is expected to be able to generalize to novel categories. By conducting comprehensive experiments on three fine-grained datasets, we demonstrate that the proposed method achieves superior performance over the competing baselines.

13.
IEEE Trans Image Process ; 26(6): 2868-2881, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28368819

RESUMO

Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.

14.
IEEE Trans Neural Netw Learn Syst ; 28(4): 975-987, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-26863679

RESUMO

Multi-instance learning (MIL) has been widely applied to diverse applications involving complicated data objects, such as images and genes. However, most existing MIL algorithms can only handle small- or moderate-sized data. In order to deal with large-scale MIL problems, we propose MIL based on the vector of locally aggregated descriptors representation (miVLAD) and MIL based on the Fisher vector representation (miFV), two efficient and scalable MIL algorithms. They map the original MIL bags into new vector representations using their corresponding mapping functions. The new feature representations keep essential bag-level information, and at the same time lead to excellent MIL performances even when linear classifiers are used. Thanks to the low computational cost in the mapping step and the scalability of linear classifiers, miVLAD and miFV can handle large-scale MIL data efficiently and effectively. Experiments show that miVLAD and miFV not only achieve comparable accuracy rates with the state-of-the-art MIL algorithms, but also have hundreds of times faster speed. Moreover, we can regard the new miVLAD and miFV representations as multiview data, which improves the accuracy rates in most cases. In addition, our algorithms perform well even when they are used without parameter tuning (i.e., adopting the default parameters), which is convenient for practical MIL applications.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA