Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 77
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-31804929

RESUMO

In this paper, we propose a novel object detection algorithm named "Deep Regionlets" by integrating deep neural networks and conventional detection schema for accurate generic object detection. Motivated by the advantages of regionlets on modeling object deformation and multiple aspect ratios, we incorporate regionlets into an end-to-end trainable deep learning framework. The deep regionlets framework consists of a region selection network and a deep regionlet learning module. Specifically, given a detection bounding box proposal, the region selection network provides guidance on where to select sub-regions from which features can be learned from. An object proposal typically contains 3-16 sub-regions. The regionlet learning module focuses on local feature selection and transformation to alleviate the effects of appearance variations. To this end, we first realize non-rectangular region selection within the detection framework to accommodate variations in object appearance. Moreover, we design a "gating network" within the regionlet leaning module to enable instance dependent soft feature selection and pooling. The Deep Regionlets framework is trained end-to-end without additional efforts. We present ablation studies and extensive experiments on the PASCAL VOC dataset and the Microsoft COCO dataset. The proposed method outperforms state-of-the-art algorithms, such as RetinaNet and Mask R-CNN, even without additional segmentation labels.

2.
IEEE Trans Pattern Anal Mach Intell ; 41(1): 121-135, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29990235

RESUMO

We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.


Assuntos
Aprendizado Profundo , Face/diagnóstico por imagem , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Postura/fisiologia , Algoritmos , Feminino , Identidade de Gênero , Humanos , Masculino
3.
Trends Cogn Sci ; 22(9): 794-809, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30097304

RESUMO

Inspired by the primate visual system, deep convolutional neural networks (DCNNs) have made impressive progress on the complex problem of recognizing faces across variations of viewpoint, illumination, expression, and appearance. This generalized face recognition is a hallmark of human recognition for familiar faces. Despite the computational advances, the visual nature of the face code that emerges in DCNNs is poorly understood. We review what is known about these codes, using the long-standing metaphor of a 'face space' to ground them in the broader context of previous-generation face recognition algorithms. We show that DCNN face representations are a fundamentally new class of visual representation that allows for, but does not assure, generalized face recognition.


Assuntos
Reconhecimento Facial , Redes Neurais de Computação , Animais , Reconhecimento Facial/fisiologia , Humanos , Córtex Visual/fisiologia
4.
IEEE Trans Image Process ; 27(4): 2022-2037, 2018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-29989985

RESUMO

The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.

5.
Proc Natl Acad Sci U S A ; 115(24): 6171-6176, 2018 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-29844174

RESUMO

Achieving the upper limits of face identification accuracy in forensic applications can minimize errors that have profound social and personal consequences. Although forensic examiners identify faces in these applications, systematic tests of their accuracy are rare. How can we achieve the most accurate face identification: using people and/or machines working alone or in collaboration? In a comprehensive comparison of face identification by humans and computers, we found that forensic facial examiners, facial reviewers, and superrecognizers were more accurate than fingerprint examiners and students on a challenging face identification test. Individual performance on the test varied widely. On the same test, four deep convolutional neural networks (DCNNs), developed between 2015 and 2017, identified faces within the range of human accuracy. Accuracy of the algorithms increased steadily over time, with the most recent DCNN scoring above the median of the forensic facial examiners. Using crowd-sourcing methods, we fused the judgments of multiple forensic facial examiners by averaging their rating-based identity judgments. Accuracy was substantially better for fused judgments than for individuals working alone. Fusion also served to stabilize performance, boosting the scores of lower-performing individuals and decreasing variability. Single forensic facial examiners fused with the best algorithm were more accurate than the combination of two examiners. Therefore, collaboration among humans and between humans and machines offers tangible benefits to face identification accuracy in important applications. These results offer an evidence-based roadmap for achieving the most accurate face identification possible.


Assuntos
Algoritmos , Identificação Biométrica/métodos , Face/anatomia & histologia , Ciências Forenses/métodos , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
6.
IEEE Trans Pattern Anal Mach Intell ; 40(7): 1653-1667, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28692963

RESUMO

Learning a classifier from ambiguously labeled face images is challenging since training images are not always explicitly-labeled. For instance, face images of two persons in a news photo are not explicitly labeled by their names in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar) method for predicting the actual labels from ambiguously labeled images. This step is followed by learning a standard supervised classifier from the disambiguated labels to classify new images. To prevent the majority labels from dominating the result of MCar, we generalize MCar to a weighted MCar (WMCar) that handles label imbalance. Since WMCar outputs a soft labeling vector of reduced ambiguity for each instance, we can iteratively refine it by feeding it as the input to WMCar. Nevertheless, such an iterative implementation can be affected by the noisy soft labeling vectors, and thus the performance may degrade. Our proposed Iterative Candidate Elimination (ICE) procedure makes the iterative ambiguity resolution possible by gradually eliminating a portion of least likely candidates in ambiguously labeled faces. We further extend MCar to incorporate the labeling constraints among instances when such prior knowledge is available. Compared to existing methods, our approach demonstrates improvements on several ambiguously labeled datasets.


Assuntos
Face/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Identificação Biométrica , Bases de Dados Factuais , Humanos
7.
IEEE Trans Image Process ; 26(10): 4741-4752, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-28682252

RESUMO

We propose multi-task and multivariate methods for multi-modal recognition based on low-rank and joint sparse representations. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all modalities are imposed. One of our methods simultaneously couples information within different modalities by enforcing the common low-rank and joint sparse constraints among multi-modal observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the resulting optimization problems. Extensive experiments on three publicly available multi-modal biometrics and object recognition data sets show that our methods compare favorably with other feature-level fusion methods.

8.
IEEE Trans Pattern Anal Mach Intell ; 39(11): 2242-2255, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28114004

RESUMO

In real-world visual recognition problems, low-level features cannot adequately characterize the semantic content in images, or the spatio-temporal structure in videos. In this work, we encode objects or actions based on attributes that describe them as high-level concepts. We consider two types of attributes. One type of attributes is generated by humans, while the second type is data-driven attributes extracted from data using dictionary learning methods. Attribute-based representation may exhibit variations due to noisy and redundant attributes. We propose a discriminative and compact attribute-based representation by selecting a subset of discriminative attributes from a large attribute set. Three attribute selection criteria are proposed and formulated as a submodular optimization problem. A greedy optimization algorithm is presented and its solution is guaranteed to be at least (1-1/e)-approximation to the optimum. Experimental results on four public datasets demonstrate that the proposed attribute-based representation significantly boosts the performance of visual recognition and outperforms most recently proposed recognition approaches.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Animais , Bases de Dados Factuais , Atividades Humanas/classificação , Humanos , Esportes/classificação , Gravação em Vídeo
9.
IEEE Trans Image Process ; 25(6): 2542-56, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27116671

RESUMO

Discriminative appearance features are effective for recognizing actions in a fixed view, but may not generalize well to a new view. In this paper, we present two effective approaches to learn dictionaries for robust action recognition across views. In the first approach, we learn a set of view-specific dictionaries where each dictionary corresponds to one camera view. These dictionaries are learned simultaneously from the sets of correspondence videos taken at different views with the aim of encouraging each video in the set to have the same sparse representation. In the second approach, we additionally learn a common dictionary shared by different views to model view-shared features. This approach represents the videos in each view using a view-specific dictionary and the common dictionary. More importantly, it encourages the set of videos taken from the different views of the same action to have the similar sparse representations. The learned common dictionary not only has the capability to represent actions from unseen views, but also makes our approach effective in a semi-supervised setting where no correspondence videos exist and only a few labeled videos exist in the target view. The extensive experiments using three public datasets demonstrate that the proposed approach outperforms recently developed approaches for cross-view action recognition.


Assuntos
Algoritmos , Aprendizado de Máquina , Terminologia como Assunto
10.
IEEE Trans Pattern Anal Mach Intell ; 38(9): 1762-73, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26552075

RESUMO

We address the video-based face association problem, in which one attempts to extract the face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases.

11.
IEEE Trans Image Process ; 24(12): 5152-65, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26390461

RESUMO

We present a dictionary learning approach to compensate for the transformation of faces due to the changes in view point, illumination, resolution, and so on. The key idea of our approach is to force domain-invariant sparse coding, i.e., designing a consistent sparse representation of the same face in different domains. In this way, the classifiers trained on the sparse codes in the source domain consisting of frontal faces can be applied to the target domain (consisting of faces in different poses, illumination conditions, and so on) without much loss in recognition accuracy. The approach is to first learn a domain base dictionary, and then describe each domain shift (identity, pose, and illumination) using a sparse representation over the base dictionary. The dictionary adapted to each domain is expressed as the sparse linear combinations of the base dictionary. In the context of face recognition, with the proposed compositional dictionary approach, a face image can be decomposed into sparse representations for a given subject, pose, and illumination. This approach has three advantages. First, the extracted sparse representation for a subject is consistent across domains, and enables pose and illumination insensitive face recognition. Second, sparse representations for pose and illumination can be subsequently used to estimate the pose and illumination condition of a face image. Last, by composing sparse representations for the subject and the different domains, we can also perform pose alignment and illumination normalization. Extensive experiments using two public face data sets are presented to demonstrate the effectiveness of the proposed approach for face recognition.


Assuntos
Face/anatomia & histologia , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Bases de Dados Factuais , Humanos
12.
IEEE Trans Image Process ; 24(12): 5479-91, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26415168

RESUMO

Complex visual data contain discriminative structures that are difficult to be fully captured by any single feature descriptor. While recent work on domain adaptation focuses on adapting a single hand-crafted feature, it is important to perform adaptation of a hierarchy of features to exploit the richness of visual data. We propose a novel framework for domain adaptation using a sparse and hierarchical network (DASH-N). Our method jointly learns a hierarchy of features together with transformations that rectify the mismatch between different domains. The building block of DASH-N is the latent sparse representation. It employs a dimensionality reduction step that can prevent the data dimension from increasing too fast as one traverses deeper into the hierarchy. The experimental results show that our method compares favorably with the competing state-of-the-art methods. In addition, it is shown that a multi-layer DASH-N performs better than a single-layer DASH-N.

13.
IEEE Trans Image Process ; 24(12): 5826-41, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26415172

RESUMO

Visual tracking using multiple features has been proved as a robust approach because features could complement each other. Since different types of variations such as illumination, occlusion, and pose may occur in a video sequence, especially long sequence videos, how to properly select and fuse appropriate features has become one of the key problems in this approach. To address this issue, this paper proposes a new joint sparse representation model for robust feature-level fusion. The proposed method dynamically removes unreliable features to be fused for tracking by using the advantages of sparse representation. In order to capture the non-linear similarity of features, we extend the proposed method into a general kernelized framework, which is able to perform feature fusion on various kernel spaces. As a result, robust tracking performance is obtained. Both the qualitative and quantitative experimental results on publicly available videos show that the proposed method outperforms both sparse representation-based and fusion based-trackers.

14.
IEEE Trans Image Process ; 24(11): 3846-57, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26186780

RESUMO

We provide two novel adaptive-rate compressive sensing (CS) strategies for sparse, time-varying signals using side information. The first method uses extra cross-validation measurements, and the second one exploits extra low-resolution measurements. Unlike the majority of current CS techniques, we do not assume that we know an upper bound on the number of significant coefficients that comprises the images in the video sequence. Instead, we use the side information to predict the number of significant coefficients in the signal at the next time instant. We develop our techniques in the specific context of background subtraction using a spatially multiplexing CS camera such as the single-pixel camera. For each image in the video sequence, the proposed techniques specify a fixed number of CS measurements to acquire and adjust this quantity from image to image. We experimentally validate the proposed methods on real surveillance video sequences.

15.
IEEE Trans Image Process ; 24(10): 2941-54, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25966476

RESUMO

Data-driven dictionaries have produced the state-of-the-art results in various classification tasks. However, when the target data has a different distribution than the source data, the learned sparse representation may not be optimal. In this paper, we investigate if it is possible to optimally represent both source and target by a common dictionary. In particular, we describe a technique which jointly learns projections of data in the two domains, and a latent dictionary which can succinctly represent both the domains in the projected low-dimensional space. The algorithm is modified to learn a common discriminative dictionary, which can further improve the classification performance. The algorithm is also effective for adaptation across multiple domains and is extensible to nonlinear feature spaces. The proposed approach does not require any explicit correspondences between the source and target domains, and yields good results even when there are only a few labels available in the target domain. We also extend it to unsupervised adaptation in cases where the same feature is extracted across all domains. Further, it can also be used for heterogeneous domain adaptation, where different features are extracted for different domains. Various recognition experiments show that the proposed method performs on par or better than competitive state-of-the-art methods.

16.
IEEE Trans Image Process ; 24(7): 2067-82, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25775493

RESUMO

Existing methods for performing face recognition in the presence of blur are based on the convolution model and cannot handle non-uniform blurring situations that frequently arise from tilts and rotations in hand-held cameras. In this paper, we propose a methodology for face recognition in the presence of space-varying motion blur comprising of arbitrarily-shaped kernels. We model the blurred face as a convex combination of geometrically transformed instances of the focused gallery face, and show that the set of all images obtained by non-uniformly blurring a given image forms a convex set. We first propose a non-uniform blur-robust algorithm by making use of the assumption of a sparse camera trajectory in the camera motion space to build an energy function with l1 -norm constraint on the camera motion. The framework is then extended to handle illumination variations by exploiting the fact that the set of all images obtained from a face image by non-uniform blurring and changing the illumination forms a bi-convex set. Finally, we propose an elegant extension to also account for variations in pose.


Assuntos
Artefatos , Face/anatomia & histologia , Reconhecimento Facial/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Iluminação/métodos , Fotografação/métodos , Biometria/métodos , Feminino , Humanos , Aumento da Imagem/métodos , Masculino , Reconhecimento Automatizado de Padrão/métodos , Postura/fisiologia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração
17.
J Opt Soc Am A Opt Image Sci Vis ; 31(5): 1090-103, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24979642

RESUMO

In recent years, sparse representation and dictionary-learning-based methods have emerged as powerful tools for efficiently processing data in nontraditional ways. A particular area of promise for these theories is face recognition. In this paper, we review the role of sparse representation and dictionary learning for efficient face identification and verification. Recent face recognition algorithms from still images, videos, and ambiguously labeled imagery are reviewed. In particular, discriminative dictionary learning algorithms as well as methods based on weakly supervised learning and domain adaptation are summarized. Some of the compelling challenges and issues that confront research in face recognition using sparse representations and dictionary learning are outlined.

18.
IEEE Trans Image Process ; 23(9): 3773-88, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24968171

RESUMO

Facial retouching is widely used in media and entertainment industry. Professional software usually require a minimum level of user expertise to achieve the desirable results. In this paper, we present an algorithm to detect facial wrinkles/imperfection. We believe that any such algorithm would be amenable to facial retouching applications. The detection of wrinkles/imperfections can allow these skin features to be processed differently than the surrounding skin without much user interaction. For detection, Gabor filter responses along with texture orientation field are used as image features. A bimodal Gaussian mixture model (GMM) represents distributions of Gabor features of normal skin versus skin imperfections. Then, a Markov random field model is used to incorporate the spatial relationships among neighboring pixels for their GMM distributions and texture orientations. An expectation-maximization algorithm then classifies skin versus skin wrinkles/imperfections. Once detected automatically, wrinkles/imperfections are removed completely instead of being blended or blurred. We propose an exemplar-based constrained texture synthesis algorithm to inpaint irregularly shaped gaps left by the removal of detected wrinkles/imperfections. We present results conducted on images downloaded from the Internet to show the efficacy of our algorithms.


Assuntos
Algoritmos , Face/anatomia & histologia , Aumento da Imagem/métodos , Pinturas , Envelhecimento da Pele , Pele/anatomia & histologia , Simulação por Computador , Interpretação de Imagem Assistida por Computador/métodos , Cadeias de Markov , Aplicativos Móveis , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Interface Usuário-Computador
19.
IEEE Trans Image Process ; 23(8): 3590-603, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24956366

RESUMO

Although facial expressions can be decomposed in terms of action units (AUs) as suggested by the facial action coding system, there have been only a few attempts that recognize expression using AUs and their composition rules. In this paper, we propose a dictionary-based approach for facial expression analysis by decomposing expressions in terms of AUs. First, we construct an AU-dictionary using domain experts' knowledge of AUs. To incorporate the high-level knowledge regarding expression decomposition and AUs, we then perform structure-preserving sparse coding by imposing two layers of grouping over AU-dictionary atoms as well as over the test image matrix columns. We use the computed sparse code matrix for each expressive face to perform expression decomposition and recognition. Since domain experts' knowledge may not always be available for constructing an AU-dictionary, we also propose a structure-preserving dictionary learning algorithm, which we use to learn a structured dictionary as well as divide expressive faces into several semantic regions. Experimental results on publicly available expression data sets demonstrate the effectiveness of the proposed approach for facial expression analysis.


Assuntos
Face/anatomia & histologia , Expressão Facial , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Fotografação/métodos , Técnica de Subtração , Algoritmos , Inteligência Artificial , Biometria/métodos , Humanos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
20.
IEEE Trans Image Process ; 23(7): 3013-24, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24835226

RESUMO

In this paper, we propose a multiple kernel learning (MKL) algorithm that is based on the sparse representation-based classification (SRC) method. Taking advantage of the nonlinear kernel SRC in efficiently representing the nonlinearities in the high-dimensional feature space, we propose an MKL method based on the kernel alignment criteria. Our method uses a two step training method to learn the kernel weights and sparse codes. At each iteration, the sparse codes are updated first while fixing the kernel mixing coefficients, and then the kernel mixing coefficients are updated while fixing the sparse codes. These two steps are repeated until a stopping criteria is met. The effectiveness of the proposed method is demonstrated using several publicly available image classification databases and it is shown that this method can perform significantly better than many competitive image classification algorithms.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...