Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38157460

RESUMO

Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. However, most of the existing datasets are developed for the geo-localization task of the objects captured by UAVs, rather than UAV self-positioning. Furthermore, the existing UAV datasets apply discrete sampling to synthetic data, such as Google Maps, neglecting the crucial aspects of dense sampling and the uncertainties commonly experienced in practical scenarios. To address these issues, this paper presents a new dataset, DenseUAV, that is the first publicly available dataset tailored for the UAV self-positioning task. DenseUAV adopts dense sampling on UAV images obtained in low-altitude urban areas. In total, over 27K UAV- and satellite-view images of 14 university campuses are collected and annotated. In terms of methodology, we first verify the superiority of Transformers over CNNs for the proposed task. Then we incorporate metric learning into representation learning to enhance the model's discriminative capacity and to reduce the modality discrepancy. Besides, to facilitate joint learning from both the satellite and UAV views, we introduce a mutually supervised learning approach. Last, we enhance the Recall@K metric and introduce a new measurement, SDM@K, to evaluate both the retrieval and localization performance for the proposed task. As a result, the proposed baseline method achieves a remarkable Recall@1 score of 83.01% and an SDM@1 score of 86.50% on DenseUAV. The dataset and code have been made publicly available on https://github.com/Dmmm1997/DenseUAV.

2.
IEEE Trans Neural Netw Learn Syst ; 33(1): 130-144, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33180734

RESUMO

Recently, there are many works on discriminant analysis, which promote the robustness of models against outliers by using L1- or L2,1-norm as the distance metric. However, both of their robustness and discriminant power are limited. In this article, we present a new robust discriminant subspace (RDS) learning method for feature extraction, with an objective function formulated in a different form. To guarantee the subspace to be robust and discriminative, we measure the within-class distances based on [Formula: see text]-norm and use [Formula: see text]-norm to measure the between-class distances. This also makes our method include rotational invariance. Since the proposed model involves both [Formula: see text]-norm maximization and [Formula: see text]-norm minimization, it is very challenging to solve. To address this problem, we present an efficient nongreedy iterative algorithm. Besides, motivated by trace ratio criterion, a mechanism of automatically balancing the contributions of different terms in our objective is found. RDS is very flexible, as it can be extended to other existing feature extraction techniques. An in-depth theoretical analysis of the algorithm's convergence is presented in this article. Experiments are conducted on several typical databases for image classification, and the promising results indicate the effectiveness of RDS.

3.
IEEE Trans Cybern ; 52(12): 12745-12758, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34546934

RESUMO

Multiview learning (MVL), which enhances the learners' performance by coordinating complementarity and consistency among different views, has attracted much attention. The multiview generalized eigenvalue proximal support vector machine (MvGSVM) is a recently proposed effective binary classification method, which introduces the concept of MVL into the classical generalized eigenvalue proximal support vector machine (GEPSVM). However, this approach cannot guarantee good classification performance and robustness yet. In this article, we develop multiview robust double-sided twin SVM (MvRDTSVM) with SVM-type problems, which introduces a set of double-sided constraints into the proposed model to promote classification performance. To improve the robustness of MvRDTSVM against outliers, we take L1-norm as the distance metric. Also, a fast version of MvRDTSVM (called MvFRDTSVM) is further presented. The reformulated problems are complex, and solving them are very challenging. As one of the main contributions of this article, we design two effective iterative algorithms to optimize the proposed nonconvex problems and then conduct theoretical analysis on the algorithms. The experimental results verify the effectiveness of our proposed methods.

4.
IEEE Trans Cybern ; 51(9): 4373-4385, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32511098

RESUMO

Eyeglasses removal is challenging in removing different kinds of eyeglasses, e.g., rimless glasses, full-rim glasses, and sunglasses, and recovering appropriate eyes. Due to the significant visual variants, the conventional methods lack scalability. Most existing works focus on the frontal face images in the controlled environment, such as the laboratory, and need to design specific systems for different eyeglass types. To address the limitation, we propose a unified eyeglass removal model called the eyeglasses removal generative adversarial network (ERGAN), which could handle different types of glasses in the wild. The proposed method does not depend on the dense annotation of eyeglasses location but benefits from the large-scale face images with weak annotations. Specifically, we study the two relevant tasks simultaneously, that is, removing eyeglasses and wearing eyeglasses. Given two face images with and without eyeglasses, the proposed model learns to swap the eye area in two faces. The generation mechanism focuses on the eye area and invades the difficulty of generating a new face. In the experiment, we show the proposed method achieves a competitive removal quality in terms of realism and diversity. Furthermore, we evaluate ERGAN on several subsequent tasks, such as face verification and facial expression recognition. The experiment shows that our method could serve as a preprocessing method for these tasks.


Assuntos
Óculos
5.
Artigo em Inglês | MEDLINE | ID: mdl-31940531

RESUMO

Conventional multi-view re-ranking methods usually perform asymmetrical matching between the region of interest (ROI) in the query image and the whole target image for similarity computation. Due to the inconsistency in the visual appearance, this practice tends to degrade the retrieval accuracy particularly when the image ROI, which is usually interpreted as the image objectness, accounts for a smaller region in the image. Since Privileged Information (PI), which can be viewed as the image prior, is able to characterize well the image objectness, we are aiming at leveraging PI for further improving the performance of multi-view re-ranking in this paper. Towards this end, we propose a discriminative multi-view re-ranking approach in which both the original global image visual contents and the local auxiliary PI features are simultaneously integrated into a unified training framework for generating the latent subspaces with sufficient discriminating power. For the on-the-fly re-ranking, since the multi-view PI features are unavailable, we only project the original multi-view image representations onto the latent subspace, and thus the re-ranking can be achieved by computing and sorting the distances from the multi-view embeddings to the separating hyperplane. Extensive experimental evaluations on the two public benchmarks, Oxford5k and Paris6k, reveal that our approach provides further performance boost for accurate image re-ranking, whilst the comparative study demonstrates the advantage of our method against other multi-view re-ranking methods.

6.
IEEE Trans Neural Netw Learn Syst ; 31(3): 813-826, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31059455

RESUMO

Learning long-term dependences (LTDs) with recurrent neural networks (RNNs) is challenging due to their limited internal memories. In this paper, we propose a new external memory architecture for RNNs called an external addressable long-term and working memory (EALWM)-augmented RNN. This architecture has two distinct advantages over existing neural external memory architectures, namely the division of the external memory into two parts-long-term memory and working memory-with both addressable and the capability to learn LTDs without suffering from vanishing gradients with necessary assumptions. The experimental results on algorithm learning, language modeling, and question answering demonstrate that the proposed neural memory architecture is promising for practical applications.


Assuntos
Bases de Dados Factuais , Memória de Longo Prazo , Memória de Curto Prazo , Redes Neurais de Computação , Memória de Longo Prazo/fisiologia , Memória de Curto Prazo/fisiologia
7.
IEEE Trans Pattern Anal Mach Intell ; 42(2): 460-474, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-30418897

RESUMO

In recent years, visual question answering (VQA) has become topical. The premise of VQA's significance as a benchmark in AI, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps 'understand' less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution [1]. In this paper we propose the inverse problem of VQA (iVQA). The iVQA task is to generate a question that corresponds to a given image and answer pair. We propose a variational iVQA model that can generate diverse, grammatically correct and content correlated questions that match the given answer. Based on this model, we show that iVQA is an interesting benchmark for visuo-linguistic understanding, and a more challenging alternative to VQA because an iVQA model needs to understand the image better to be successful. As a second contribution, we show how to use iVQA in a novel reinforcement learning framework to diagnose any existing VQA model by way of exposing its belief set: the set of question-answer pairs that the VQA model would predict true for a given image. This provides a completely new window into what VQA models 'believe' about images. We show that existing VQA models have more erroneous beliefs than previously thought, revealing their intrinsic weaknesses. Suggestions are then made on how to address these weaknesses going forward.

8.
IEEE Trans Neural Netw Learn Syst ; 31(7): 2361-2375, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31870994

RESUMO

Zero-shot learning (ZSL), a type of structured multioutput learning, has attracted much attention due to its requirement of no training data for target classes. Conventional ZSL methods usually project visual features into semantic space and assign labels by finding their nearest prototypes. However, this type of nearest neighbor search (NNS)-based method often suffers from great performance degradation because of the nonuniform variances between different categories. In this article, we propose a probabilistic framework by taking covariance into account to deal with the above-mentioned problem. In this framework, we define a new latent space, which has two characteristics. The first is that the features in this space should gather within the classes and scatter between the classes, which is implemented by triplet learning; the second is that the prototypes of unseen classes are synthesized with nonnegative coefficients, which are generated by nonnegative matrix factorization (NMF) of relations between the seen classes and the unseen classes in attribute space. During training, the learned parameters are the projection model for triplet network and the nonnegative coefficients between the unseen classes and the seen classes. In the testing phase, visual features are projected into latent space and assigned with the labels that have the maximum probability among unseen classes for classic ZSL or within all classes for generalized ZSL. Extensive experiments are conducted on four popular data sets, and the results show that the proposed method can outperform the state-of-the-art methods in most circumstances.

9.
IEEE Trans Neural Netw Learn Syst ; 30(12): 3818-3832, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31725389

RESUMO

Of late, there are many studies on the robust discriminant analysis, which adopt L1-norm as the distance metric, but their results are not robust enough to gain universal acceptance. To overcome this problem, the authors of this article present a nonpeaked discriminant analysis (NPDA) technique, in which cutting L1-norm is adopted as the distance metric. As this kind of norm can better eliminate heavy outliers in learning models, the proposed algorithm is expected to be stronger in performing feature extraction tasks for data representation than the existing robust discriminant analysis techniques, which are based on the L1-norm distance metric. The authors also present a comprehensive analysis to show that cutting L1-norm distance can be computed equally well, using the difference between two special convex functions. Against this background, an efficient iterative algorithm is designed for the optimization of the proposed objective. Theoretical proofs on the convergence of the algorithm are also presented. Theoretical insights and effectiveness of the proposed method are validated by experimental tests on several real data sets.

10.
Artigo em Inglês | MEDLINE | ID: mdl-31765313

RESUMO

Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared with state-of-the-art CNN architectures, we achieve better results on several commonly used public benchmarks.

11.
IEEE Trans Neural Netw Learn Syst ; 30(10): 2898-2915, 2019 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30176609

RESUMO

Fisher's criterion is one of the most popular discriminant criteria for feature extraction. It is defined as the generalized Rayleigh quotient of the between-class scatter distance to the within-class scatter distance. Consequently, Fisher's criterion does not take advantage of the discriminant information in the class covariance differences, and hence, its discriminant ability largely depends on the class mean differences. If the class mean distances are relatively large compared with the within-class scatter distance, Fisher's criterion-based discriminant analysis methods may achieve a good discriminant performance. Otherwise, it may not deliver good results. Moreover, we observe that the between-class distance of Fisher's criterion is based on the l2 -norm, which would be disadvantageous to separate the classes with smaller class mean distances. To overcome the drawback of Fisher's criterion, in this paper, we first derive a new discriminant criterion, expressed as a mixture of absolute generalized Rayleigh quotients, based on a Bayes error upper bound estimation, where mixture of Gaussians is adopted to approximate the real distribution of data samples. Then, the criterion is further modified by replacing l2 -norm with l1 one to better describe the between-class scatter distance, such that it would be more effective to separate the different classes. Moreover, we propose a novel l1 -norm heteroscedastic discriminant analysis method based on the new discriminant analysis (L1-HDA/GM) for heteroscedastic feature extraction, in which the optimization problem of L1-HDA/GM can be efficiently solved by using the eigenvalue decomposition approach. Finally, we conduct extensive experiments on four real data sets and demonstrate that the proposed method achieves much competitive results compared with the state-of-the-art methods.

12.
IEEE Trans Image Process ; 26(7): 3113-3127, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28092544

RESUMO

Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.

13.
IEEE Trans Image Process ; 23(2): 570-81, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26270909

RESUMO

In this paper, we propose using high-level action units to represent human actions in videos and, based on such units, a novel sparse model is developed for human action recognition. There are three interconnected components in our approach. First, we propose a new context-aware spatial-temporal descriptor, named locally weighted word context, to improve the discriminability of the traditionally used local spatial-temporal descriptors. Second, from the statistics of the context-aware descriptors, we learn action units using the graph regularized nonnegative matrix factorization, which leads to a part-based representation and encodes the geometrical information. These units effectively bridge the semantic gap in action recognition. Third, we propose a sparse model based on a joint l2,1-norm to preserve the representative items and suppress noise in the action units. Intuitively, when learning the dictionary for action representation, the sparse model captures the fact that actions from the same class share similar units. The proposed approach is evaluated on several publicly available data sets. The experimental results and analysis clearly demonstrate the effectiveness of the proposed approach.


Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Atividade Motora/fisiologia , Movimento/fisiologia , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Imagem Corporal Total/métodos , Algoritmos , Humanos , Aumento da Imagem/métodos , Imageamento Tridimensional/métodos , Fotografação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA