Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
IEEE Trans Image Process ; 32: 2215-2227, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37040248

RESUMO

Semi-supervised learning has been well established in the area of image classification but remains to be explored in video-based action recognition. FixMatch is a state-of-the-art semi-supervised method for image classification, but it does not work well when transferred directly to the video domain since it only utilizes the single RGB modality, which contains insufficient motion information. Moreover, it only leverages highly-confident pseudo-labels to explore consistency between strongly-augmented and weakly-augmented samples, resulting in limited supervised signals, long training time, and insufficient feature discriminability. To address the above issues, we propose neighbor-guided consistent and contrastive learning (NCCL), which takes both RGB and temporal gradient (TG) as input and is based on the teacher-student framework. Due to the limitation of labelled samples, we first incorporate neighbors information as a self-supervised signal to explore the consistent property, which compensates for the lack of supervised signals and the shortcoming of long training time of FixMatch. To learn more discriminative feature representations, we further propose a novel neighbor-guided category-level contrastive learning term to minimize the intra-class distance and enlarge the inter-class distance. We conduct extensive experiments on four datasets to validate the effectiveness. Compared with the state-of-the-art methods, our proposed NCCL achieves superior performance with much lower computational cost.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37022401

RESUMO

Contrastive learning has been successfully applied in unsupervised representation learning. However, the generalization ability of representation learning is limited by the fact that the loss of downstream tasks (e.g., classification) is rarely taken into account while designing contrastive methods. In this article, we propose a new contrastive-based unsupervised graph representation learning (UGRL) framework by 1) maximizing the mutual information (MI) between the semantic information and the structural information of the data and 2) designing three constraints to simultaneously consider the downstream tasks and the representation learning. As a result, our proposed method outputs robust low-dimensional representations. Experimental results on 11 public datasets demonstrate that our proposed method is superior over recent state-of-the-art methods in terms of different downstream tasks. Our code is available at https://github.com/LarryUESTC/GRLC.

3.
IEEE Trans Image Process ; 31: 4803-4816, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35830405

RESUMO

Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts.


Assuntos
Identificação Biométrica , Algoritmos , Benchmarking , Identificação Biométrica/métodos , Análise por Conglomerados , Humanos
4.
J Opt Soc Am A Opt Image Sci Vis ; 38(6): 827-839, 2021 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-34143152

RESUMO

Imaging in the natural scene under ill lighting conditions (e.g., low light, back-lit, over-exposed front-lit, and any combinations of them) suffers from both over- and under-exposure at the same time, whereas processing of such images often results in over- and under-enhancement. A single small image sensor can hardly provide satisfactory quality for ill lighting conditions with ordinary optical lenses in capturing devices. Challenges arise in the maintenance of a visual smoothness between those regions, while color and contrast should be well preserved. The problem has been approached by various methods, including multiple sensors and handcrafted parameters, but extant model capacity is limited to only some specific scenes (i.e., lighting conditions). Motivated by these challenges, in this paper, we propose a deep image enhancement method for color images captured under ill lighting conditions. In this method, input images are first decomposed into reflection and illumination maps with the proposed layer distribution loss net, where the illumination blindness and structure degradation problem can be subsequently solved via these two components, respectively. The hidden degradation in reflection and illumination is tuned with a knowledge-based adaptive enhancement constraint designed for ill illuminated images. The model can maintain a balance of smoothness and contribute to solving the problem of noise besides over- and under-enhancement. The local consistency in illumination is achieved via a repairing operation performed in the proposed Repair-Net. The total variation operator is optimized to acquire local consistency, and the image gradient is guided with the proposed enhancement constraint. Finally, a product of updated reflection and illumination maps reconstructs an enhanced image. Experiments are organized under both very low exposure and ill illumination conditions, where a new dataset is also proposed. Results on both experiments show that our method has superior performance in preserving structural and textural details compared to other states of the art, which suggests that our method is more practical in future visual applications.

5.
IEEE Trans Cybern ; 50(7): 3330-3342, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30892258

RESUMO

This paper introduces a convolutional neural network (CNN) semantic re-ranking system to enhance the performance of sketch-based image retrieval (SBIR). Distinguished from the existing approaches, the proposed system can leverage category information brought by CNNs to support effective similarity measurement between the images. To achieve effective classification of query sketches and high-quality initial retrieval results, one CNN model is trained for classification of sketches, another for that of natural images. Through training dual CNN models, the semantic information of both the sketches and natural images is captured by deep learning. In order to measure the category similarity between images, a category similarity measurement method is proposed. Category information is then used for re-ranking. Re-ranking operation first infers the retrieval category of the query sketch and then uses the category similarity measurement to measure the category similarity between the query sketch and each initial retrieval result. Finally, the initial retrieval results are re-ranked. The experiments on different types of SBIR datasets demonstrate the effectiveness of the proposed re-ranking method. Comparisons with other re-ranking algorithms are also given to show the proposed method's superiority. Further, compared to the baseline systems, the proposed re-ranking approach achieves significantly higher precision in the top ten different SBIR methods and datasets.

6.
IEEE Trans Cybern ; 47(11): 3941-3954, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28113794

RESUMO

Hashing compresses high-dimensional features into compact binary codes. It is one of the promising techniques to support efficient mobile image retrieval, due to its low data transmission cost and fast retrieval response. However, most of existing hashing strategies simply rely on low-level features. Thus, they may generate hashing codes with limited discriminative capability. Moreover, many of them fail to exploit complex and high-order semantic correlations that inherently exist among images. Motivated by these observations, we propose a novel unsupervised hashing scheme, called topic hypergraph hashing (THH), to address the limitations. THH effectively mitigates the semantic shortage of hashing codes by exploiting auxiliary texts around images. In our method, relations between images and semantic topics are first discovered via robust collective non-negative matrix factorization. Afterwards, a unified topic hypergraph, where images and topics are represented with independent vertices and hyperedges, respectively, is constructed to model inherent high-order semantic correlations of images. Finally, hashing codes and functions are learned by simultaneously enforcing semantic consistence and preserving the discovered semantic relations. Experiments on publicly available datasets demonstrate that THH can achieve superior performance compared with several state-of-the-art methods, and it is more suitable for mobile image retrieval.

7.
IEEE Trans Cybern ; 47(1): 14-26, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26595936

RESUMO

How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified best-versus-second-best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.

8.
IEEE Trans Image Process ; 24(11): 4556-69, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26285148

RESUMO

Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert 2D image patches into 1D vectors for further processing. Thus, these methods inevitably break down the inherent 2D geometric structure of natural images. To overcome this limitation pertaining to the previous image denoising methods, we propose a 2D image denoising model, namely, the dictionary pair learning (DPL) model, and we design a corresponding algorithm called the DPL on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove the noise. Consequently, the DPLG algorithm not only preserves the inherent 2D geometric structure of natural images but also performs manifold smoothing in the 2D sparse coding space. We demonstrate that the DPLG algorithm also improves the structural SIMilarity values of the perceptual visual quality for denoised images using the experimental evaluations on the benchmark images and Berkeley segmentation data sets. Moreover, the DPLG also produces the competitive peak signal-to-noise ratio values from popular image denoising algorithms.

9.
IEEE Trans Cybern ; 45(12): 2756-69, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25576590

RESUMO

While content-based landmark image search has recently received a lot of attention and became a very active domain, it still remains a challenging problem. Among the various reasons, high diverse visual content is the most significant one. It is common that for the same landmark, images with a wide range of visual appearances can be found from different sources and different landmarks may share very similar sets of images. As a consequence, it is very hard to accurately estimate the similarities between the landmarks purely based on single type of visual feature. Moreover, the relationships between landmark images can be very complex and how to develop an effective modeling scheme to characterize the associations still remains an open question. Motivated by these concerns, we propose multimodal hypergraph (MMHG) to characterize the complex associations between landmark images. In MMHG, images are modeled as independent vertices and hyperedges contain several vertices corresponding to particular views. Multiple hypergraphs are firstly constructed independently based on different visual modalities to describe the hidden high-order relations from different aspects. Then, they are integrated together to involve discriminative information from heterogeneous sources. We also propose a novel content-based visual landmark search system based on MMHG to facilitate effective search. Distinguished from the existing approaches, we design a unified computational module to support query-specific combination weight learning. An extensive experiment study on a large-scale test collection demonstrates the effectiveness of our scheme over state-of-the-art approaches.

10.
IEEE Trans Cybern ; 45(8): 1561-74, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25248210

RESUMO

This paper introduces a novel approach to facilitating image search based on a compact semantic embedding. A novel method is developed to explicitly map concepts and image contents into a unified latent semantic space for the representation of semantic concept prototypes. Then, a linear embedding matrix is learned that maps images into the semantic space, such that each image is closer to its relevant concept prototype than other prototypes. In our approach, the semantic concepts equated with query keywords and the images mapped into the vicinity of the prototype are retrieved by our scheme. In addition, a computationally efficient method is introduced to incorporate new semantic concept prototypes into the semantic space by updating the embedding matrix. This novelty improves the scalability of the method and allows it to be applied to dynamic image repositories. Therefore, the proposed approach not only narrows semantic gap but also supports an efficient image search process. We have carried out extensive experiments on various cross-modality image search tasks over three widely-used benchmark image datasets. Results demonstrate the superior effectiveness, efficiency, and scalability of our proposed approach.

11.
IEEE Trans Image Process ; 22(1): 363-76, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22692911

RESUMO

Due to the popularity of social media websites, extensive research efforts have been dedicated to tag-based social image search. Both visual information and tags have been investigated in the research field. However, most existing methods use tags and visual characteristics either separately or sequentially in order to estimate the relevance of images. In this paper, we propose an approach that simultaneously utilizes both visual and textual information to estimate the relevance of user tagged images. The relevance estimation is determined with a hypergraph learning approach. In this method, a social image hypergraph is constructed, where vertices represent images and hyperedges represent visual or textual terms. Learning is achieved with use of a set of pseudo-positive images, where the weights of hyperedges are updated throughout the learning process. In this way, the impact of different tags and visual words can be automatically modulated. Comparative results of the experiments conducted on a dataset including 370+images are presented, which demonstrate the effectiveness of the proposed approach.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Fotografação/métodos , Mídias Sociais , Animais , Bases de Dados Factuais , Humanos
12.
IEEE Trans Image Process ; 22(3): 860-71, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23014746

RESUMO

Recent techniques based on sparse representation (SR) have demonstrated promising performance in high-level visual recognition, exemplified by the highly accurate face recognition under occlusion and other sparse corruptions. Most research in this area has focused on classification algorithms using raw image pixels, and very few have been proposed to utilize the quantized visual features, such as the popular bag-of-words feature abstraction. In such cases, besides the inherent quantization errors, ambiguity associated with visual word assignment and misdetection of feature points, due to factors such as visual occlusions and noises, constitutes the major cause of dense corruptions of the quantized representation. The dense corruptions can jeopardize the decision process by distorting the patterns of the sparse reconstruction coefficients. In this paper, we aim to eliminate the corruptions and achieve robust image analysis with SR. Toward this goal, we introduce two transfer processes (ambiguity transfer and mis-detection transfer) to account for the two major sources of corruption as discussed. By reasonably assuming the rarity of the two kinds of distortion processes, we augment the original SR-based reconstruction objective with l(0) norm regularization on the transfer terms to encourage sparsity and, hence, discourage dense distortion/transfer. Computationally, we relax the nonconvex l(0) norm optimization into a convex l(1) norm optimization problem, and employ the accelerated proximal gradient method to optimize the convergence provable updating procedure. Extensive experiments on four benchmark datasets, Caltech-101, Caltech-256, Corel-5k, and CMU pose, illumination, and expression, manifest the necessity of removing the quantization corruptions and the various advantages of the proposed framework.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA