Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4747-4762, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38261478

RESUMEN

Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods.

2.
IEEE Trans Image Process ; 32: 6346-6358, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37966925

RESUMEN

The transferability of adversarial examples across different convolutional neural networks (CNNs) makes it feasible to perform black-box attacks, resulting in security threats for CNNs. However, fewer endeavors have been made to investigate transferable attacks for vision transformers (ViTs), which achieve superior performance on various computer vision tasks. Unlike CNNs, ViTs establish relationships between patches extracted from inputs by the self-attention module. Thus, adversarial examples crafted on CNNs might hardly attack ViTs. To assess the security of ViTs comprehensively, we investigate the transferability across different ViTs in both untargetd and targeted scenarios. More specifically, we propose a Pay No Attention (PNA) attack, which ignores attention gradients during backpropagation to improve the linearity of backpropagation. Additionally, we introduce a PatchOut/CubeOut attack for image/video ViTs. They optimize perturbations within a randomly selected subset of patches/cubes during each iteration, preventing over-fitting to the white-box surrogate ViT model. Furthermore, we maximize the L2 norm of perturbations, ensuring that the generated adversarial examples deviate significantly from the benign ones. These strategies are designed to be harmoniously compatible. Combining them can enhance transferability by jointly considering patch-based inputs and the self-attention of ViTs. Moreover, the proposed combined attack seamlessly integrates with existing transferable attacks, providing an additional boost to transferability. We conduct experiments on ImageNet and Kinetics-400 for image and video ViTs, respectively. Experimental results demonstrate the effectiveness of the proposed method.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3749-3766, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33577449

RESUMEN

We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3× speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it's akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5× speed-up during inference when used with SNIP. Code: https://github.com/mahyarnajibi/SNIPER.


Asunto(s)
Algoritmos , Humanos
4.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 1699-1711, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-33026981

RESUMEN

We introduce AdaFrame, a conditional computation framework that adaptively selects relevant frames on a per-input basis for fast video recognition. AdaFrame, which contains a Long Short-Term Memory augmented with a global memory to provide context information, operates as an agent to interact with video sequences aiming to search over time which frames to use. Trained with policy search methods, at each time step, AdaFrame computes a prediction, decides where to observe next, and estimates a utility, i.e., expected future rewards, of viewing more frames in the future. Exploring predicted utilities at testing time, AdaFrame is able to achieve adaptive lookahead inference so as to minimize the overall computational cost without incurring a degradation in accuracy. We conduct extensive experiments on two large-scale video benchmarks, FCVID and ActivityNet. With a vanilla ResNet-101 model, AdaFrame achieves similar performance of using all frames while only requiring, on average, 8.21 and 8.65 frames on FCVID and ActivityNet, respectively. We also demonstrate AdaFrame is compatible with modern 2D and 3D networks for video recognition. Furthermore, we show, among other things, learned frame usage can reflect the difficulty of making prediction decisions both at instance-level within the same class and at class-level among different categories.


Asunto(s)
Algoritmos
5.
IEEE Trans Neural Netw Learn Syst ; 31(7): 2490-2499, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31425125

RESUMEN

Preactivation ResNets consistently outperforms the original postactivation ResNets on the CIFAR10/100 classification benchmark. However, these results surprisingly do not carry over to the standard ImageNet benchmark. First, we theoretically analyze this incongruity in terms of how the two variants differ in handling the propagation of gradients. Although identity shortcuts are critical in both variants for improving optimization and performance, we show that postactivation variants enable early layers to receive a diverse dynamic composition of gradients from effectively deeper paths in comparison to preactivation variants, enabling the network to make maximal use of its representational capacity. Second, we show that downsampling projections (while only a few in number) have a significantly detrimental effect on performance. We show that by simply replacing downsampling projections with identitylike dense-reshape shortcuts, the classification results of standard residual architectures such as ResNets, ResNeXts, and SE-Nets improve by up to 1.2% on ImageNet, without any increase in computational complexity (FLOPs).

6.
IEEE Trans Pattern Anal Mach Intell ; 31(10): 1775-89, 2009 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-19696449

RESUMEN

Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene/event, analyzing human movements, recognizing manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human-object interactions. Previous approaches to object and action recognition rely on static shape/appearance feature matching and motion analysis, respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information.


Asunto(s)
Actividades Humanas , Procesamiento de Imagen Asistido por Computador/métodos , Modelos Biológicos , Movimiento/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Teorema de Bayes , Humanos , Reconocimiento en Psicología , Grabación en Video
7.
IEEE Trans Pattern Anal Mach Intell ; 31(5): 919-30, 2009 May.
Artículo en Inglés | MEDLINE | ID: mdl-19299864

RESUMEN

Particle filtering is frequently used for visual tracking problems since it provides a general framework for estimating and propagating probability density functions for nonlinear and non-Gaussian dynamic systems. However, this algorithm is based on a Monte Carlo approach and the cost of sampling and measurement is a problematic issue, especially for high-dimensional problems. We describe an alternative to the classical particle filter in which the underlying density function has an analytic representation for better approximation and effective propagation. The techniques of density interpolation and density approximation are introduced to represent the likelihood and the posterior densities with Gaussian mixtures, where all relevant parameters are automatically determined. The proposed analytic approach is shown to perform more efficiently in sampling in high-dimensional space. We apply the algorithm to real-time tracking problems and demonstrate its performance on real video sequences as well as synthetic examples.


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Técnica de Sustracción , Teorema de Bayes , Aumento de la Imagen/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
8.
IEEE Trans Pattern Anal Mach Intell ; 41(1): 246-259, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-29990056

RESUMEN

Non-negative matrix factorization (NMF) minimizes the euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order , where is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces.

9.
IEEE Trans Pattern Anal Mach Intell ; 30(3): 493-506, 2008 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-18195442

RESUMEN

Automatic initialization and tracking of human pose is an important task in visual surveillance. We present a part-based approach that incorporates a variety of constraints in a unified framework. These constraints include the kinematic constraints between parts that are physically connected to each other, the occlusion of one part by another and the high correlation between the appearance of certain parts, such as the arms. The location probability distribution of each part is determined by evaluating appropriate likelihood measures. The graphical (non-tree) structure representing the interdependencies between parts is utilized to "connect" such part distributions via nonparametric belief propagation. Methods are also developed to perform this optimization efficiently in the large space of pose configurations.


Asunto(s)
Inteligencia Artificial , Biometría/métodos , Imagenología Tridimensional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Postura/fisiología , Grabación en Video/métodos , Imagen de Cuerpo Entero/métodos , Algoritmos , Humanos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
10.
IEEE Trans Pattern Anal Mach Intell ; 30(7): 1186-97, 2008 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-18550902

RESUMEN

Visual features are commonly modeled with probability density functions in computer vision problems, but current methods such as a mixture of Gaussians and kernel density estimation suffer from either the lack of flexibility, by fixing or limiting the number of Gaussian components in the mixture, or large memory requirement, by maintaining a non-parametric representation of the density. These problems are aggravated in real-time computer vision applications since density functions are required to be updated as new data becomes available. We present a novel kernel density approximation technique based on the mean-shift mode finding algorithm, and describe an efficient method to sequentially propagate the density modes over time. While the proposed density representation is memory efficient, which is typical for mixture densities, it inherits the flexibility of non-parametric methods by allowing the number of components to be variable. The accuracy and compactness of the sequential kernel density approximation technique is illustrated by both simulations and experiments. Sequential kernel density approximation is applied to on-line target appearance modeling for visual tracking, and its performance is demonstrated on a variety of videos.


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Simulación por Computador , Sistemas de Computación , Aumento de la Imagen/métodos , Modelos Estadísticos , Movimiento (Física) , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Distribuciones Estadísticas
11.
IEEE Trans Pattern Anal Mach Intell ; 28(7): 1164-9, 2006 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-16792104

RESUMEN

We develop computational models for measuring hair appearance for comparing different people. The models and methods developed have applications to person recognition and image indexing. An automatic hair detection algorithm is described and results reported. A multidimensional representation of hair appearance is presented and computational algorithms are described. Results on a data set of 524 subjects are reported. Identification of people using hair attributes is compared to eigenface-based recognition along with a joint, eigenface-hair-based identification.


Asunto(s)
Inteligencia Artificial , Colorimetría/métodos , Cabello/anatomía & histología , Cabello/fisiología , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Color , Simulación por Computador , Aumento de la Imagen/métodos , Almacenamiento y Recuperación de la Información/métodos , Modelos Biológicos
12.
IEEE Trans Pattern Anal Mach Intell ; 38(3): 518-31, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-27046495

RESUMEN

To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract "Multi-Directional Multi-Level Dual-Cross Patterns" (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.


Asunto(s)
Identificación Biométrica/métodos , Cara/anatomía & histología , Cara/diagnóstico por imagen , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos , Humanos , Análisis de Componente Principal
13.
IEEE Trans Pattern Anal Mach Intell ; 38(7): 1411-24, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-26452250

RESUMEN

We propose a novel algorithm to cluster and annotate a set of input images jointly, where the images are clustered into several discriminative groups and each group is identified with representative labels automatically. For these purposes, each input image is first represented by a distribution of candidate labels based on its similarity to images in a labeled reference image database. A set of these label-based representations are then refined collectively through a non-negative matrix factorization with sparsity and orthogonality constraints; the refined representations are employed to cluster and annotate the input images jointly. The proposed approach demonstrates performance improvements in image clustering over existing techniques, and illustrates competitive image labeling accuracy in both quantitative and qualitative evaluation. In addition, we extend our joint clustering and labeling framework to solving the weakly-supervised image classification problem and obtain promising results.

14.
IEEE Trans Pattern Anal Mach Intell ; 35(11): 2651-64, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24051726

RESUMEN

A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.


Asunto(s)
Algoritmos , Biometría/métodos , Cara/anatomía & histología , Reconocimiento de Normas Patrones Automatizadas/métodos , Máquina de Vectores de Soporte , Análisis Discriminante , Humanos , Interpretación de Imagen Asistida por Computador/métodos
15.
IEEE Trans Pattern Anal Mach Intell ; 35(5): 1248-62, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23520262

RESUMEN

We describe a framework that leverages mixed probabilistic and deterministic networks and their AND/OR search space to efficiently find and track the hands and feet of multiple interacting humans in 2D from a single camera view. Our framework detects and tracks multiple people's heads, hands, and feet through partial or full occlusion; requires few constraints (does not require multiple views, high image resolution, knowledge of performed activities, or large training sets); and makes use of constraints and AND/OR Branch-and-Bound with lazy evaluation and carefully computed bounds to efficiently solve the complex network that results from the consideration of interperson occlusion. Our main contributions are: 1) a multiperson part-based formulation that emphasizes extremities and allows for the globally optimal solution to be obtained in each frame, and 2) an efficient and exact optimization scheme that relies on AND/OR Branch-and-Bound, lazy factor evaluation, and factor cost sensitive bound computation. We demonstrate our approach on three datasets: the public single person HumanEva dataset, outdoor sequences where multiple people interact in a group meeting scenario, and outdoor one-on-one basketball videos. The first dataset demonstrates that our framework achieves state-of-the-art performance in the single person setting, while the last two demonstrate robustness in the presence of partial and full occlusion and fast nontrivial motion.


Asunto(s)
Pie/fisiología , Mano/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Actividad Motora/fisiología , Reconocimiento de Normas Patrones Automatizadas/métodos , Baloncesto , Bases de Datos Factuales , Análisis Discriminante , Humanos , Análisis de los Mínimos Cuadrados , Grabación en Video
16.
IEEE Trans Pattern Anal Mach Intell ; 34(5): 1017-23, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22156099

RESUMEN

Background modeling and subtraction is a natural technique for object detection in videos captured by a static camera, and also a critical preprocessing step in various high-level computer vision applications. However, there have not been many studies concerning useful features and binary segmentation algorithms for this problem. We propose a pixelwise background modeling and subtraction technique using multiple features, where generative and discriminative techniques are combined for classification. In our algorithm, color, gradient, and Haar-like features are integrated to handle spatio-temporal variations for each pixel. A pixelwise generative background model is obtained for each feature efficiently and effectively by Kernel Density Approximation (KDA). Background subtraction is performed in a discriminative manner using a Support Vector Machine (SVM) over background likelihood vectors for a set of features. The proposed algorithm is robust to shadow, illumination changes, spatial variations of background. We compare the performance of the algorithm with other density-based methods using several different feature combinations and modeling techniques, both quantitatively and qualitatively.

17.
IEEE Trans Pattern Anal Mach Intell ; 34(3): 533-47, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21788666

RESUMEN

A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.


Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Humanos , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos
18.
IEEE Trans Image Process ; 21(4): 2245-55, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22128005

RESUMEN

With the goal of matching unknown faces against a gallery of known people, the face identification task has been studied for several decades. There are very accurate techniques to perform face identification in controlled environments, particularly when large numbers of samples are available for each face. However, face identification under uncontrolled environments or with a lack of training data is still an unsolved problem. We employ a large and rich set of feature descriptors (with more than 70,000 descriptors) for face identification using partial least squares to perform multichannel feature weighting. Then, we extend the method to a tree-based discriminative structure to reduce the time required to evaluate probe samples. The method is evaluated on Facial Recognition Technology (FERET) and Face Recognition Grand Challenge (FRGC) data sets. Experiments show that our identification method outperforms current state-of-the-art results, particularly for identifying faces acquired across varying conditions.


Asunto(s)
Algoritmos , Inteligencia Artificial , Biometría/métodos , Cara/anatomía & histología , Interpretación de Imagen Asistida por Computador/métodos , Almacenamiento y Recuperación de la Información/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Humanos , Aumento de la Imagen/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
19.
IEEE Trans Pattern Anal Mach Intell ; 33(6): 1250-65, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-20921579

RESUMEN

Detecting vehicles in aerial images has a wide range of applications, from urban planning to visual surveillance. We describe a vehicle detector that improves upon previous approaches by incorporating a very large and rich set of image descriptors. A new feature set called Color Probability Maps is used to capture the color statistics of vehicles and their surroundings, along with the Histograms of Oriented Gradients feature and a simple yet powerful image descriptor that captures the structural characteristics of objects named Pairs of Pixels. The combination of these features leads to an extremely high-dimensional feature set (approximately 70,000 elements). Partial Least Squares is first used to project the data onto a much lower dimensional sub-space. Then, a powerful feature selection analysis is employed to improve the performance while vastly reducing the number of features that must be calculated. We compare our system to previous approaches on two challenging data sets and show superior performance.


Asunto(s)
Algoritmos , Aumento de la Imagen/métodos , Análisis de los Mínimos Cuadrados , Vehículos a Motor , Reconocimiento de Normas Patrones Automatizadas/métodos , Inteligencia Artificial , Color , Colorimetría/métodos
20.
IEEE Trans Pattern Anal Mach Intell ; 32(4): 604-18, 2010 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-20224118

RESUMEN

We propose a shape-based, hierarchical part-template matching approach to simultaneous human detection and segmentation combining local part-based and global shape-template-based schemes. The approach relies on the key idea of matching a part-template tree to images hierarchically to detect humans and estimate their poses. For learning a generic human detector, a pose-adaptive feature computation scheme is developed based on a tree matching approach. Instead of traditional concatenation-style image location-based feature encoding, we extract features adaptively in the context of human poses and train a kernel-SVM classifier to separate human/nonhuman patterns. Specifically, the features are collected in the local context of poses by tracing around the estimated shape boundaries. We also introduce an approach to multiple occluded human detection and segmentation based on an iterative occlusion compensation scheme. The output of our learned generic human detector can be used as an initial set of human hypotheses for the iterative optimization. We evaluate our approaches on three public pedestrian data sets (INRIA, MIT-CBCL, and USC-B) and two crowded sequences from Caviar Benchmark and Munich Airport data sets.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Postura/fisiología , Inteligencia Artificial , Aglomeración , Árboles de Decisión , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA