Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6896-6908, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32750802

RESUMO

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a criss-cross network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11× less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85 percent of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9, 45.76 and 55.47 percent on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at https://github.com/speedinghzl/CCNethttps://github.com/speedinghzl/CCNet.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 550-557, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33646946

RESUMO

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation. However, most of the current popular network architectures tend to ignore the misalignment issues during the feature aggregation process caused by step-by-step downsampling operations and indiscriminate contextual information fusion. In this paper, we explore the principles in addressing such feature misalignment issues and inventively propose Feature-Aligned Segmentation Networks (AlignSeg). AlignSeg consists of two primary modules, i.e., the Aligned Feature Aggregation (AlignFA) module and the Aligned Context Modeling (AlignCM) module. First, AlignFA adopts a simple learnable interpolation strategy to learn transformation offsets of pixels, which can effectively relieve the feature misalignment issue caused by multi-resolution feature aggregation. Second, with the contextual embeddings in hand, AlignCM enables each pixel to choose private custom contextual information adaptively, making the contextual embeddings be better aligned. We validate the effectiveness of our AlignSeg network with extensive experiments on Cityscapes and ADE20K, achieving new state-of-the-art mIoU scores of 82.6 and 45.95 percent, respectively. Our source code is available at https://github.com/speedinghzl/AlignSeg.

3.
IEEE Trans Cybern ; 50(9): 3855-3865, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32497014

RESUMO

One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this article, we propose a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we first adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework that can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, surpassing the baseline methods.

4.
Artigo em Inglês | MEDLINE | ID: mdl-31944972

RESUMO

Image denoising and high-level vision tasks are usually handled independently in the conventional practice of computer vision, and their connection is fragile. In this paper, we cope with the two jointly and explore the mutual influence between them with the focus on two questions, namely (1) how image denoising can help improving high-level vision tasks, and (2) how the semantic information from high-level vision tasks can be used to guide image denoising. First for image denoising we propose a convolutional neural network in which convolutions are conducted in various spatial resolutions via downsampling and upsampling operations in order to fuse and exploit contextual information on different scales. Second we propose a deep neural network solution that cascades two modules for image denoising and various high-level tasks, respectively, and use the joint loss for updating only the denoising network via backpropagation. We experimentally show that on one hand, the proposed denoiser has the generality to overcome the performance degradation of different high-level vision tasks. On the other hand, with the guidance of high-level vision information, the denoising network produces more visually appealing results. Extensive experiments demonstrate the benefit of exploiting image semantics simultaneously for image denoising and highlevel vision tasks via deep learning. The code is available online: https://github.com/Ding-Liu/DeepDenoising.

5.
Artigo em Inglês | MEDLINE | ID: mdl-32365031

RESUMO

In this paper, we propose a deep CNN to tackle the image restoration problem by learning formatted information. Previous deep learning based methods directly learn the mapping from corrupted images to clean images, and may suffer from the gradient exploding/vanishing problems of deep neural networks. We propose to address the image restoration problem by learning the structured details and recovering the latent clean image together, from the shared information between the corrupted image and the latent image. In addition, instead of learning the pure difference (corruption), we propose to add a residual formatting layer and an adversarial block to format the information to structured one, which allows the network to converge faster and boosts the performance. Furthermore, we propose a cross-level loss net to ensure both pixel-level accuracy and semantic-level visual quality. Evaluations on public datasets show that the proposed method performs favorably against existing approaches quantitatively and qualitatively.

6.
IEEE Trans Pattern Anal Mach Intell ; 31(1): 39-58, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19029545

RESUMO

Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions despite the fact that deliberate behaviour differs in visual appearance, audio profile, and timing from spontaneously occurring behaviour. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behaviour have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis including audiovisual fusion, linguistic and paralinguistic fusion, and multi-cue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next we examine available approaches to solving the problem of machine understanding of human affective behavior, and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.


Assuntos
Afeto/fisiologia , Algoritmos , Inteligência Artificial , Emoções/fisiologia , Expressão Facial , Monitorização Fisiológica/métodos , Reconhecimento Automatizado de Padrão/métodos , Espectrografia do Som/métodos
7.
IEEE Trans Pattern Anal Mach Intell ; 31(7): 1210-24, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19443920

RESUMO

In this paper, we present a fusion approach to solve the nonrigid shape recovery problem, which takes advantage of both the appearance information and the local features. We have two major contributions. First, we propose a novel progressive finite Newton optimization scheme for the feature-based nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem that has a closed-form solution for a given set of observations. Second, we propose a deformable Lucas-Kanade algorithm that triangulates the template image into small patches and constrains the deformation through the second-order derivatives of the mesh vertices. We formulate it into a sparse regularized least squares problem, which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is both efficient and effective.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
8.
IEEE Trans Pattern Anal Mach Intell ; 31(10): 1913-20, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19696459

RESUMO

The success of bilinear subspace learning heavily depends on reducing correlations among features along rows and columns of the data matrices. In this work, we study the problem of rearranging elements within a matrix in order to maximize these correlations so that information redundancy in matrix data can be more extensively removed by existing bilinear subspace learning algorithms. An efficient iterative algorithm is proposed to tackle this essentially integer programming problem. In each step, the matrix structure is refined with a constrained Earth Mover's Distance procedure that incrementally rearranges matrices to become more similar to their low-rank approximations, which have high correlation among features along rows and columns. In addition, we present two extensions of the algorithm for conducting supervised bilinear subspace learning. Experiments in both unsupervised and supervised bilinear subspace learning demonstrate the effectiveness of our proposed algorithms in improving data compression performance and classification accuracy.


Assuntos
Algoritmos , Inteligência Artificial , Compressão de Dados/métodos , Face/fisiologia , Distribuição Normal
9.
IEEE Trans Image Process ; 18(2): 241-9, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19126470

RESUMO

In this paper, our contributions to the subspace learning problem are two-fold. We first justify that most popular subspace learning algorithms, unsupervised or supervised, can be unitedly explained as instances of a ubiquitously supervised prototype. They all essentially minimize the intraclass compactness and at the same time maximize the interclass separability, yet with specialized labeling approaches, such as ground truth, self-labeling, neighborhood propagation, and local subspace approximation. Then, enlightened by this ubiquitously supervised philosophy, we present two categories of novel algorithms for subspace learning, namely, misalignment-robust and semi-supervised subspace learning. The first category is tailored to computer vision applications for improving algorithmic robustness to image misalignments, including image translation, rotation and scaling. The second category naturally integrates the label information from both ground truth and other approaches for unsupervised algorithms. Extensive face recognition experiments on the CMU PIE and FRGC ver1.0 databases demonstrate that the misalignment-robust version algorithms consistently bring encouraging accuracy improvements over the counterparts without considering image misalignments, and also show the advantages of semi-supervised subspace learning over only supervised or unsupervised scheme.


Assuntos
Algoritmos , Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Humanos , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
10.
IEEE Trans Image Process ; 18(1): 202-10, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19068429

RESUMO

Precise 3-D head pose estimation plays a significant role in developing human-computer interfaces and practical face recognition systems. This task is challenging due to the particular appearance variations caused by pose changes for a certain subject. In this paper, the pose data space is considered as a union of submanifolds which characterize different subjects, instead of a single continuous manifold as conventionally regarded. A novel manifold embedding algorithm dually supervised by both identity and pose information, called synchronized submanifold embedding (SSE), is proposed for person-independent precise 3-D pose estimation, which means that the testing subject may not appear in the model training stage. First, the submanifold of a certain subject is approximated as a set of simplexes constructed using neighboring samples. Then, these simplexized submanifolds from different subjects are embedded by synchronizing the locally propagated poses within the simplexes and at the same time maximizing the intrasubmanifold variances. Finally, the pose of a new datum is estimated as the propagated pose of the nearest point within the simplex constructed by its nearest neighbors in the dimensionality reduced feature space. The experiments on the 3-D pose estimation database, CHIL data for CLEAR07 evaluation, and the extended application for age estimation on FG-NET aging database, demonstrate the superiority of SSE over conventional regression algorithms as well as unsupervised manifold learning algorithms.


Assuntos
Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Cabeça/anatomia & histologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Humanos , Postura , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração
11.
IEEE Trans Image Process ; 18(1): 140-50, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19095525

RESUMO

For the problem of image registration, the top few reliable correspondences are often relatively easy to obtain, while the overall matching accuracy may fall drastically as the desired correspondence number increases. In this paper, we present an efficient feature matching algorithm to employ sparse reliable correspondence priors for piloting the feature matching process. First, the feature geometric relationship within individual image is encoded as a spatial graph, and the pairwise feature similarity is expressed as a bipartite similarity graph between two feature sets; then the geometric neighborhood of the pairwise assignment is represented by a categorical product graph, along which the reliable correspondences are propagated; and finally a closed-form solution for feature matching is deduced by ensuring the feature geometric coherency as well as pairwise feature agreements. Furthermore, our algorithm is naturally applicable for incorporating manual correspondence priors for semi-supervised feature matching. Extensive experiments on both toy examples and real-world applications demonstrate the superiority of our algorithm over the state-of-the-art feature matching techniques.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
IEEE Trans Image Process ; 18(3): 670-6, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19179253

RESUMO

In this corespondence, we study the extra-factor estimation problem with the assumption that the training image ensemble is expressed as an nth-order tensor with the nth-dimension characterizing all features for an image and other dimensions for different extra factors, such as illuminations, poses, and identities. To overcome the local minimum issue of conventional algorithms designed for this problem, we present a novel statistical learning framework called mode-kn Factor Analysis for obtaining a closed-form solution to estimating the extra factors of any test image. In the learning stage, for the kth [see formula in text] dimension of the data tensor, the mode-kn patterns are constructed by concatenating the feature dimension and the kth extra-factor dimension, and then a mode-kn factor analysis model is learnt based on the mode-kn patterns unfolded from the original data tensor. In the inference stage, for a test image, the mode classification of the kth dimension is performed within a probabilistic framework. The advantages of mode-kn factor analysis over conventional tensor analysis algorithms are twofold: 1) a closed-form solution, instead of iterative sub-optimal solution as conventionally, is derived for estimating the extra-factor mode of any test image; and 2) the classification capability is enhanced by interacting with the process of synthesizing data of all other modes in the k th dimension. Experiments on the Pointing'04 and CMU PIE databases for pose and illumination estimation both validate the superiority of the proposed algorithm over conventional algorithms for extra-factor estimation.


Assuntos
Algoritmos , Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
13.
Artigo em Inglês | MEDLINE | ID: mdl-30946668

RESUMO

Visual recognition under adverse conditions is a very important and challenging problem of high practical value, due to the ubiquitous existence of quality distortions during image acquisition, transmission, or storage. While deep neural networks have been extensively exploited in the techniques of low-quality image restoration and high-quality image recognition tasks respectively, few studies have been done on the important problem of recognition from very low-quality images. This paper proposes a deep learning based framework for improving the performance of image and video recognition models under adverse conditions, using robust adverse pre-training or its aggressive variant. The robust adverse pre-training algorithms leverage the power of pre-training and generalizes conventional unsupervised pre-training and data augmentation methods. We further develop a transfer learning approach to cope with real-world datasets of unknown adverse conditions. The proposed framework is comprehensively evaluated on a number of image and video recognition benchmarks, and obtains significant performance improvements under various single or mixed adverse conditions. Our visualization and analysis further add to the explainability of results.

14.
IEEE Trans Pattern Anal Mach Intell ; 30(12): 2229-35, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18988954

RESUMO

Beyond conventional linear and kernel-based feature extraction, we propose in this paper the generalized feature extraction formulation based on the so-called Graph Embedding framework. Two novel correlation metric based algorithms are presented based on this formulation. Correlation Embedding Analysis (CEA), which incorporates both correlational mapping and discriminating analysis, boosts the discriminating power by mapping data from a high-dimensional hypersphere onto another low-dimensional hypersphere and preserving the intrinsic neighbor relations with local graph modeling. Correlational Principal Component Analysis (CPCA) generalizes the conventional Principal Component Analysis (PCA) algorithm to the case with data distributed on a high-dimensional hypersphere. Their advantages stem from two facts: 1) tailored to normalized data, which are often the outputs from the data preprocessing step, and 2) directly designed with correlation metric, which shows to be generally better than Euclidean distance for classification purpose. Extensive comparisons with existing algorithms on visual classification experiments demonstrate the effectiveness of the proposed algorithms.


Assuntos
Algoritmos , Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estatística como Assunto
15.
IEEE Trans Image Process ; 17(2): 226-34, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18270114

RESUMO

Images, as high-dimensional data, usually embody large variabilities. To classify images for versatile applications, an effective algorithm is necessarily designed by systematically considering the data structure, similarity metric, discriminant subspace, and classifier. In this paper, we provide evidence that, besides the Fisher criterion, graph embedding, and tensorization used in many existing methods, the correlation-based similarity metric embodied in supervised multilinear discriminant subspace learning can additionally improve the classification performance. In particular, a novel discriminant subspace learning algorithm, called correlation tensor analysis (CTA), is designed to incorporate both graph-embedded correlational mapping and discriminant analysis in a Fisher type of learning manner. The correlation metric can estimate intrinsic angles and distances for the locally isometric embedding, which can deal with the case when Euclidean metric is incapable of capturing the intrinsic similarities between data points. CTA learns multiple interrelated subspaces to obtain a low-dimensional data representation reflecting both class label information and intrinsic geometric structure of the data distribution. Extensive comparisons with most popular subspace learning methods on face recognition evaluation demonstrate the effectiveness and superiority of CTA. Parameter analysis also reveals its robustness.


Assuntos
Algoritmos , Inteligência Artificial , Face/anatomia & histologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Biometria/métodos , Análise Discriminante , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
IEEE Trans Image Process ; 17(7): 1178-88, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18586625

RESUMO

Estimating human age automatically via facial image analysis has lots of potential real-world applications, such as human computer interaction and multimedia communication. However, it is still a challenging problem for the existing computer vision systems to automatically and effectively estimate human ages. The aging process is determined by not only the person's gene, but also many external factors, such as health, living style, living location, and weather conditions. Males and females may also age differently. The current age estimation performance is still not good enough for practical use and more effort has to be put into this research direction. In this paper, we introduce the age manifold learning scheme for extracting face aging features and design a locally adjusted robust regressor for learning and prediction of human ages. The novel approach improves the age estimation accuracy significantly over all previous methods. The merit of the proposed approaches for image-based age estimation is shown by extensive experiments on a large internal age database and the public available FG-NET database.


Assuntos
Envelhecimento/fisiologia , Algoritmos , Inteligência Artificial , Face/anatomia & histologia , Face/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Humanos , Aumento da Imagem/métodos , Análise de Regressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
IEEE Trans Image Process ; 27(9): 4585-4597, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29993548

RESUMO

In this paper, we propose to exploit the interactions between non-associable tracklets to facilitate multi-object tracking. We introduce two types of tracklet interactions, close interaction and distant interaction. The close interaction imposes physical constraints between two temporally overlapping tracklets and more importantly, allows us to learn local classifiers to distinguish targets that are close to each other in the spatiotemporal domain. The distant interaction, on the other hand, accounts for the higher-order motion and appearance consistency between two temporally isolated tracklets. Our approach is modeled as a binary labeling problem and solved using the efficient Quadratic Pseudo-Boolean Optimization (QPBO). It yields promising tracking performance on the challenging PETS09 and MOT16 dataset. Our code will be made publicly available upon the acceptance of the manuscript.

18.
Artigo em Inglês | MEDLINE | ID: mdl-29994092

RESUMO

Video super-resolution (SR) aims at estimating a high-resolution (HR) video sequence from a low-resolution (LR) one. Given that deep learning has been successfully applied to the task of single image SR, which demonstrates the strong capability of neural networks for modeling spatial relation within one single image, the key challenge to conduct video SR is how to efficiently and effectively exploit the temporal dependency among consecutive LR frames other than the spatial relation. However, this remains challenging because complex motion is difficult to model and can bring detrimental effects if not handled properly. We tackle the problem of learning temporal dynamics from two aspects. First, we propose a temporal adaptive neural network that can adaptively determine the optimal scale of temporal dependency. Inspired by the Inception module in GoogLeNet [1], filters of various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated, in order to fully exploit the temporal relation among consecutive LR frames. Second, we decrease the complexity of motion among neighboring frames using a spatial alignment network that can be end-to-end trained with the temporal adaptive network and has the merit of increasing the robustness to complex motion and the efficiency compared to competing image alignment methods. We provide a comprehensive evaluation of the temporal adaptation and the spatial alignment modules. We show the temporal adaptive design considerably improve SR quality over its plain counterparts, and the spatial alignment network is able to attain comparable SR performance with the sophisticated optical flow based approach, but requires much less running time. Overall our proposed model with learned temporal dynamics is shown to achieve state-of-the-art SR results in terms of not only spatial consistency but also temporal coherence on public video datasets. More information can be found in.

19.
AMIA Annu Symp Proc ; 2018: 185-194, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815056

RESUMO

In an effort to guide the development of a computer agent (CA)-based adviser system that presents patient-centered language to older adults (e.g., medication instructions in portal environments or smartphone apps), we evaluated 360 older and younger adults' responses to medication information delivered by a set of CAs. We assessed patient memory for medication information, their affective responses to the information, their perception of the CA's teaching effectiveness and expressiveness, and their perceived level of similarity with each CA. Each participant saw CAs varying in appearance and levels of realism (Photo-realistic vs Cartoon vs Emoji, as control condition). To investigate the impact of affective cues on patients, we varied CA message framing, with effects described either as gains of taking or losses of not taking the medication. Our results corroborate the idea that CAs can produce a significant effect on older adults' learning in part by engendering social responses.


Assuntos
Comunicação , Conduta do Tratamento Medicamentoso , Software , Tradução , Adulto , Fatores Etários , Idoso , Recursos Audiovisuais , Feminino , Letramento em Saúde , Humanos , Masculino , Memória , Pessoa de Meia-Idade , Unified Medical Language System
20.
IEEE Trans Image Process ; 16(11): 2802-10, 2007 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-17990756

RESUMO

This paper presents a unified solution to three unsolved problems existing in face verification with subspace learning techniques: selection of verification threshold, automatic determination of subspace dimension, and deducing feature fusing weights. In contrast to previous algorithms which search for the projection matrix directly, our new algorithm investigates a similarity metric matrix (SMM). With a certain verification threshold, this matrix is learned by a semidefinite programming approach, along with the constraints of the kindred pairs with similarity larger than the threshold, and inhomogeneous pairs with similarity smaller than the threshold. Then, the subspace dimension and the feature fusing weights are simultaneously inferred from the singular value decomposition of the derived SMM. In addition, the weighted and tensor extensions are proposed to further improve the algorithmic effectiveness and efficiency, respectively. Essentially, the verification is conducted within an affine subspace in this new algorithm and is, hence, called the affine subspace for verification (ASV). Extensive experiments show that the ASV can achieve encouraging face verification accuracy in comparison to other subspace algorithms, even without the need to explore any parameters.


Assuntos
Algoritmos , Inteligência Artificial , Biometria/métodos , Face/anatomia & histologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Humanos , Armazenamento e Recuperação da Informação/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA