Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Syst Man Cybern B Cybern ; 41(4): 1088-96, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21296709

RESUMO

Dimension reduction has been widely used in real-world applications such as image retrieval and document classification. In many scenarios, different features (or multiview data) can be obtained, and how to duly utilize them is a challenge. It is not appropriate for the conventional concatenating strategy to arrange features of different views into a long vector. That is because each view has its specific statistical property and physical interpretation. Even worse, the performance of the concatenating strategy will deteriorate if some views are corrupted by noise. In this paper, we propose a multiview stochastic neighbor embedding (m-SNE) that systematically integrates heterogeneous features into a unified representation for subsequent processing based on a probabilistic framework. Compared with conventional strategies, our approach can automatically learn a combination coefficient for each view adapted to its contribution to the data embedding. This combination coefficient plays an important role in utilizing the complementary information in multiview data. Also, our algorithm for learning the combination coefficient converges at a rate of O(1/k(2)), which is the optimal rate for smooth problems. Experiments on synthetic and real data sets suggest the effectiveness and robustness of m-SNE for data visualization, image retrieval, object categorization, and scene recognition.

2.
IEEE Trans Syst Man Cybern B Cybern ; 41(6): 1471-82, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21690016

RESUMO

Prior to pattern recognition, feature selection is often used to identify relevant features and discard irrelevant ones for obtaining improved analysis results. In this paper, we aim to develop an unsupervised feature ranking algorithm that evaluates features using discovered local coherent patterns, which are known as biclusters. The biclusters (viewed as submatrices) are discovered from a data matrix. These submatrices are used for scoring relevant features from two aspects, i.e., the interdependence of features and the separability of instances. The features are thereby ranked with respect to their accumulated scores from the total discovered biclusters before the pattern classification. Experimental results show that this proposed method can yield comparable or even better performance in comparison with the well-known Fisher score, Laplacian score, and variance score using three UCI data sets, well improve the results of gene expression data analysis using gene ontology annotation, and finally demonstrate its advantage of unsupervised feature ranking for high-dimensional data.

3.
IEEE Trans Syst Man Cybern B Cybern ; 41(6): 1668-80, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21768049

RESUMO

The biologically inspired model (BIM) proposed by Serre presents a promising solution to object categorization. It emulates the process of object recognition in primates' visual cortex by constructing a set of scale- and position-tolerant features whose properties are similar to those of the cells along the ventral stream of visual cortex. However, BIM has potential to be further improved in two aspects: mismatch by dense input and randomly feature selection due to the feedforward framework. To solve or alleviate these limitations, we develop an enhanced BIM (EBIM) in terms of the following two aspects: 1) removing uninformative inputs by imposing sparsity constraints, 2) apply a feedback loop to middle level feature selection. Each aspect is motivated by relevant psychophysical research findings. To show the effectiveness of the EBIM, we apply it to object categorization and conduct empirical studies on four computer vision data sets. Experimental results demonstrate that the EBIM outperforms the BIM and is comparable to state-of-the-art approaches in terms of accuracy. Moreover, the new system is about 20 times faster than the BIM.

4.
IEEE Trans Syst Man Cybern B Cybern ; 41(4): 921-30, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21216714

RESUMO

Distribution calibration plays an important role in cross-domain learning. However, existing distribution distance metrics are not geodesic; therefore, they cannot measure the intrinsic distance between two distributions. In this paper, we calibrate two distributions by using the geodesic distance in Riemannian symmetric space. Our method learns a latent subspace in the reproducing kernel Hilbert space, where the geodesic distance between the distribution of the source and the target domains is minimized. The corresponding geodesic distance is thus equivalent to the geodesic distance between two symmetric positive definite (SPD) matrices defined in the Riemannian symmetric space. These two SPD matrices parameterize the marginal distributions of the source and target domains in the latent subspace. We carefully design an evolutionary algorithm to find a local optimal solution that minimizes this geodesic distance. Empirical studies on face recognition, text categorization, and web image annotation suggest the effectiveness of the proposed scheme.

5.
IEEE Trans Image Process ; 27(2): 791-805, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29757732

RESUMO

Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations in human appearances from different camera views. This paper presents a logistic discriminant metric learning method for this challenging problem. Different with most existing metric learning algorithms, it exploits both original data and auxiliary data during training, which is motivated by the new machine learning paradigm-learning using privileged information. Such privileged information is a kind of auxiliary knowledge, which is only available during training. Our goal is to learn an optimal distance function by constructing a locally adaptive decision rule with the help of privileged information. We jointly learn two distance metrics by minimizing the empirical loss penalizing the difference between the distance in the original space and that in the privileged space. In our setting, the distance in the privileged space functions as a local decision threshold, which guides the decision making in the original space like a teacher. The metric learned from the original space is used to compute the distance between a probe image and a gallery image during testing. In addition, we extend the proposed approach to a multi-view setting which is able to explore the complementation of multiple feature representations. In the multi-view setting, multiple metrics corresponding to different original features are jointly learned, guided by the same privileged information. Besides, an effective iterative optimization scheme is introduced to simultaneously optimize the metrics and the assigned metric weights. Experiment results on several widely-used data sets demonstrate that the proposed approach is superior to global decision threshold-based methods and outperforms most state-of-the-art results.

6.
IEEE Trans Image Process ; 27(3): 1323-1335, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29990221

RESUMO

Feature selection is beneficial for improving the performance of general machine learning tasks by extracting an informative subset from the high-dimensional features. Conventional feature selection methods usually ignore the class imbalance problem, thus the selected features will be biased towards the majority class. Considering that F-measure is a more reasonable performance measure than accuracy for imbalanced data, this paper presents an effective feature selection algorithm that explores the class imbalance issue by optimizing F-measures. Since F-measure optimization can be decomposed into a series of cost-sensitive classification problems, we investigate the cost-sensitive feature selection by generating and assigning different costs to each class with rigorous theory guidance. After solving a series of cost-sensitive feature selection problems, features corresponding to the best F-measure will be selected. In this way, the selected features will fully represent the properties of all classes. Experimental results on popular benchmarks and challenging real-world data sets demonstrate the significance of cost-sensitive feature selection for the imbalanced data setting and validate the effectiveness of the proposed method.

7.
IEEE Trans Image Process ; 26(11): 5324-5336, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28749350

RESUMO

Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared with the projection based hashing methods, prototype-based ones own stronger power to generate discriminative binary codes for the data with complex intrinsic structure. However, existing prototype-based methods, such as spherical hashing and K-means hashing, still suffer from the ineffective coding that utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization (ABQ) method that learns a discriminative hash function with prototypes associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes, and enjoys the fast training linear to the number of the training data. We further devise a distributed framework for the large-scale learning, which can significantly speed up the training of ABQ in the distributed environment that has been widely deployed in many areas nowadays. The extensive experiments on four large-scale (up to 80 million) data sets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58.84% performance gains relatively.

8.
IEEE Trans Image Process ; 26(8): 3951-3964, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28574353

RESUMO

Objective assessment of image quality is fundamentally important in many image processing tasks. In this paper, we focus on learning blind image quality assessment (BIQA) models, which predict the quality of a digital image with no access to its original pristine-quality counterpart as reference. One of the biggest challenges in learning BIQA models is the conflict between the gigantic image space (which is in the dimension of the number of image pixels) and the extremely limited reliable ground truth data for training. Such data are typically collected via subjective testing, which is cumbersome, slow, and expensive. Here, we first show that a vast amount of reliable training data in the form of quality-discriminable image pairs (DIPs) can be obtained automatically at low cost by exploiting large-scale databases with diverse image content. We then learn an opinion-unaware BIQA (OU-BIQA, meaning that no subjective opinions are used for training) model using RankNet, a pairwise learning-to-rank (L2R) algorithm, from millions of DIPs, each associated with a perceptual uncertainty level, leading to a DIP inferred quality (dipIQ) index. Extensive experiments on four benchmark IQA databases demonstrate that dipIQ outperforms the state-of-the-art OU-BIQA models. The robustness of dipIQ is also significantly improved as confirmed by the group MAximum Differentiation competition method. Furthermore, we extend the proposed framework by learning models with ListNet (a listwise L2R algorithm) on quality-discriminable image lists (DIL). The resulting DIL inferred quality index achieves an additional performance gain.

9.
IEEE Trans Image Process ; 26(1): 452-463, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-28113763

RESUMO

Graph model is emerging as a very effective tool for learning the complex structures and relationships hidden in data. In general, the critical purpose of graph-oriented learning algorithms is to construct an informative graph for image clustering and classification tasks. In addition to the classical K-nearest-neighbor and r-neighborhood methods for graph construction, l1-graph and its variants are emerging methods for finding the neighboring samples of a center datum, where the corresponding ingoing edge weights are simultaneously derived by the sparse reconstruction coefficients of the remaining samples. However, the pairwise links of l1-graph are not capable of capturing the high-order relationships between the center datum and its prominent data in sparse reconstruction. Meanwhile, from the perspective of variable selection, the l1 norm sparse constraint, regarded as a LASSO model, tends to select only one datum from a group of data that are highly correlated and ignore the others. To simultaneously cope with these drawbacks, we propose a new elastic net hypergraph learning model, which consists of two steps. In the first step, the robust matrix elastic net model is constructed to find the canonically related samples in a somewhat greedy way, achieving the grouping effect by adding the l2 penalty to the l1 constraint. In the second step, hypergraph is used to represent the high order relationships between each datum and its prominent samples by regarding them as a hyperedge. Subsequently, hypergraph Laplacian matrix is constructed for further analysis. New hypergraph learning algorithms, including unsupervised clustering and multi-class semi-supervised classification, are then derived. Extensive experiments on face and handwriting databases demonstrate the effectiveness of the proposed method.

10.
IEEE Trans Image Process ; 26(2): 797-807, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27913349

RESUMO

Cascade regression is a popular face alignment approach, and it has achieved good performances on the wild databases. However, it depends heavily on local features in estimating reliable landmark locations and therefore suffers from corrupted images, such as images with occlusion, which often exists in real-world face images. In this paper, we present a new adaptive cascade regression model for robust face alignment. In each iteration, the shape-indexed appearance is introduced to estimate the occlusion level of each landmark, and each landmark is then weighted according to its estimated occlusion level. Also, the occlusion levels of the landmarks act as adaptive weights on the shape-indexed features to decrease the noise on the shape-indexed features. At the same time, an exemplar-based shape prior is designed to suppress the influence of local image corruption. Extensive experiments are conducted on the challenging benchmarks, and the experimental results demonstrate that the proposed method achieves better results than the state-of-the-art methods for facial landmark localization and occlusion detection.

11.
IEEE Trans Neural Netw Learn Syst ; 28(7): 1490-1507, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28287983

RESUMO

Feature selection (FS) is an important component of many pattern recognition tasks. In these tasks, one is often confronted with very high-dimensional data. FS algorithms are designed to identify the relevant feature subset from the original features, which can facilitate subsequent analysis, such as clustering and classification. Structured sparsity-inducing feature selection (SSFS) methods have been widely studied in the last few years, and a number of algorithms have been proposed. However, there is no comprehensive study concerning the connections between different SSFS methods, and how they have evolved. In this paper, we attempt to provide a survey on various SSFS methods, including their motivations and mathematical representations. We then explore the relationship among different formulations and propose a taxonomy to elucidate their evolution. We group the existing SSFS methods into two categories, i.e., vector-based feature selection (feature selection based on lasso) and matrix-based feature selection (feature selection based on lr,p-norm). Furthermore, FS has been combined with other machine learning algorithms for specific applications, such as multitask learning, multilabel learning, multiview learning, classification, and clustering. This paper not only compares the differences and commonalities of these methods based on regression and regularization strategies, but also provides useful guidelines to practitioners working in related fields to guide them how to do feature selection.

12.
IEEE Trans Image Process ; 26(9): 4331-4346, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27723591

RESUMO

We investigate the scalable image classification problem with a large number of categories. Hierarchical visual data structures are helpful for improving the efficiency and performance of large-scale multi-class classification. We propose a novel image classification method based on learning hierarchical inter-class structures. Specifically, we first design a fast algorithm to compute the similarity metric between categories, based on which a visual tree is constructed by hierarchical spectral clustering. Using the learned visual tree, a test sample label is efficiently predicted by searching for the best path over the entire tree. The proposed method is extensively evaluated on the ILSVRC2010 and Caltech 256 benchmark datasets. The experimental results show that our method obtains significantly better category hierarchies than other state-of-the-art visual tree-based methods and, therefore, much more accurate classification.

13.
IEEE Trans Image Process ; 25(11): 5345-5357, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27552753

RESUMO

Hyperspectral images provide great potential for target detection, however, new challenges are also introduced for hyperspectral target detection, resulting that hyperspectral target detection should be treated as a new problem and modeled differently. Many classical detectors are proposed based on the linear mixing model and the sparsity model. However, the former type of model cannot deal well with spectral variability in limited endmembers, and the latter type of model usually treats the target detection as a simple classification problem and pays less attention to the low target probability. In this case, can we find an efficient way to utilize both the high-dimension features behind hyperspectral images and the limited target information to extract small targets? This paper proposes a novel sparsity-based detector named the hybrid sparsity and statistics detector (HSSD) for target detection in hyperspectral imagery, which can effectively deal with the above two problems. The proposed algorithm designs a hypothesis-specific dictionary based on the prior hypotheses for the test pixel, which can avoid the imbalanced number of training samples for a class-specific dictionary. Then, a purification process is employed for the background training samples in order to construct an effective competition between the two hypotheses. Next, a sparse representation-based binary hypothesis model merged with additive Gaussian noise is proposed to represent the image. Finally, a generalized likelihood ratio test is performed to obtain a more robust detection decision than the reconstruction residual-based detection methods. Extensive experimental results with three hyperspectral data sets confirm that the proposed HSSD algorithm clearly outperforms the state-of-the-art target detectors.

14.
IEEE Trans Image Process ; 25(12): 5814-5827, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28114066

RESUMO

The increasing number of 3D objects in various applications has increased the requirement for effective and efficient 3D object retrieval methods, which attracted extensive research efforts in recent years. Existing works mainly focus on how to extract features and conduct object matching. With the increasing applications, 3D objects come from different areas. In such circumstances, how to conduct object retrieval becomes more important. To address this issue, we propose a multi-view object retrieval method using multi-scale topic models in this paper. In our method, multiple views are first extracted from each object, and then the dense visual features are extracted to represent each view. To represent the 3D object, multi-scale topic models are employed to extract the hidden relationship among these features with respect to varied topic numbers in the topic model. In this way, each object can be represented by a set of bag of topics. To compare the objects, we first conduct topic clustering for the basic topics from two data sets, and then generate the common topic dictionary for new representation. Then, the two objects can be aligned to the same common feature space for comparison. To evaluate the performance of the proposed method, experiments are conducted on two data sets. The 3D object retrieval experimental results and comparison with existing methods demonstrate the effectiveness of the proposed method.

15.
IEEE Trans Image Process ; 25(1): 414-27, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26529763

RESUMO

The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.

16.
IEEE Trans Image Process ; 25(11): 5187-5198, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28873058

RESUMO

Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, the layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called bilateral rectified linear unit, which is able to improve the quality of recovered haze-free image. We establish connections between the components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.

17.
IEEE Trans Image Process ; 25(12): 5610-5621, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28113975

RESUMO

Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision, and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this paper, we propose a novel binary code optimization method, dubbed discrete proximal linearized minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this paper by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised ℓ2 loss encodes the whole NUS-WIDE database into 64-b binary codes within 10 s on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale data sets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.

18.
IEEE Trans Image Process ; 24(8): 2355-68, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25872208

RESUMO

There is growing interest in multilabel image classification due to its critical role in web-based image analytics-based applications, such as large-scale image retrieval and browsing. Matrix completion (MC) has recently been introduced as a method for transductive (semisupervised) multilabel classification, and has several distinct advantages, including robustness to missing data and background noise in both feature and label space. However, it is limited by only considering data represented by a single-view feature, which cannot precisely characterize images containing several semantic concepts. To utilize multiple features taken from different views, we have to concatenate the different features as a long vector. However, this concatenation is prone to over-fitting and often leads to very high time complexity in MC-based image classification. Therefore, we propose to weightedly combine the MC outputs of different views, and present the multiview MC (MVMC) framework for transductive multilabel image classification. To learn the view combination weights effectively, we apply a cross-validation strategy on the labeled set. In particular, MVMC splits the labeled set into two parts, and predicts the labels of one part using the known labels of the other part. The predicted labels are then used to learn the view combination coefficients. In the learning process, we adopt the average precision (AP) loss, which is particular suitable for multilabel image classification, since the ranking-based criteria are critical for evaluating a multilabel classification system. A least squares loss formulation is also presented for the sake of efficiency, and the robustness of the algorithm based on the AP loss compared with the other losses is investigated. Experimental evaluation on two real-world data sets (PASCAL VOC' 07 and MIR Flickr) demonstrate the effectiveness of MVMC for transductive (semisupervised) multilabel image classification, and show that MVMC can exploit complementary properties of different features and output-consistent labels for improved multilabel image classification.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA