Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6552-6574, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-36215368

RESUMEN

Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these tracking paradigms. Furthermore, we thoroughly analyze the performance of DCF and Siamese trackers on nine benchmarks, covering different experimental aspects of visual tracking: datasets, evaluation metrics, performance, and speed comparisons. We finish the survey by presenting recommendations and suggestions for distinguished open challenges based on our analysis.

2.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2423-2436, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31331882

RESUMEN

Generally, convolutional neural networks (CNNs) process data on a regular grid, e.g., data generated by ordinary cameras. Designing CNNs for sparse and irregularly spaced input data is still an open research problem with numerous applications in autonomous driving, robotics, and surveillance. In this paper, we propose an algebraically-constrained normalized convolution layer for CNNs with highly sparse input that has a smaller number of network parameters compared to related work. We propose novel strategies for determining the confidence from the convolution operation and propagating it to consecutive layers. We also propose an objective function that simultaneously minimizes the data error while maximizing the output confidence. To integrate structural information, we also investigate fusion strategies to combine depth and RGB information in our normalized convolution network framework. In addition, we introduce the use of output confidence as an auxiliary information to improve the results. The capabilities of our normalized convolution network framework are demonstrated for the problem of scene depth completion. Comprehensive experiments are performed on the KITTI-Depth and the NYU-Depth-v2 datasets. The results clearly demonstrate that the proposed approach achieves superior performance while requiring only about 1-5 percent of the number of parameters compared to the state-of-the-art methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 39(8): 1561-1575, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-27654137

RESUMEN

Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5 percent in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50 percent higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.

4.
Ecol Evol ; 6(19): 6930-6942, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-28725370

RESUMEN

Migratory songbirds carry an inherited capacity to migrate several thousand kilometers each year crossing continental landmasses and barriers between distant breeding sites and wintering areas. How individual songbirds manage with extreme precision to find their way is still largely unknown. The functional characteristics of biological compasses used by songbird migrants has mainly been investigated by recording the birds directed migratory activity in circular cages, so-called Emlen funnels. This method is 50 years old and has not received major updates over the past decades. The aim of this work was to compare the results from newly developed digital methods with the established manual methods to evaluate songbird migratory activity and orientation in circular cages.We performed orientation experiments using the European robin (Erithacus rubecula) using modified Emlen funnels equipped with thermal paper and simultaneously recorded the songbird movements from above. We evaluated and compared the results obtained with five different methods. Two methods have been commonly used in songbirds' orientation experiments; the other three methods were developed for this study and were based either on evaluation of the thermal paper using automated image analysis, or on the analysis of videos recorded during the experiment.The methods used to evaluate scratches produced by the claws of birds on the thermal papers presented some differences compared with the video analyses. These differences were caused mainly by differences in scatter, as any movement of the bird along the sloping walls of the funnel was recorded on the thermal paper, whereas video evaluations allowed us to detect single takeoff attempts by the birds and to consider only this behavior in the orientation analyses. Using computer vision, we were also able to identify and separately evaluate different behaviors that were impossible to record by the thermal paper.The traditional Emlen funnel is still the most used method to investigate compass orientation in songbirds under controlled conditions. However, new numerical image analysis techniques provide a much higher level of detail of songbirds' migratory behavior and will provide an increasing number of possibilities to evaluate and quantify specific behaviors as new algorithms will be developed.

5.
IEEE Trans Image Process ; 23(8): 3633-45, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24956369

RESUMEN

Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.


Asunto(s)
Actigrafía/métodos , Biometría/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis para Determinación del Sexo/métodos , Imagen de Cuerpo Entero/métodos , Algoritmos , Inteligencia Artificial , Femenino , Humanos , Aumento de la Imagen/métodos , Masculino , Reproducibilidad de los Resultados , Semántica , Sensibilidad y Especificidad
6.
IEEE Trans Cybern ; 43(1): 155-69, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22773046

RESUMEN

Perception-action (P-A) learning is an approach to cognitive system building that seeks to reduce the complexity associated with conventional environment-representation/action-planning approaches. Instead, actions are directly mapped onto the perceptual transitions that they bring about, eliminating the need for intermediate representation and significantly reducing training requirements. We here set out a very general learning framework for cognitive systems in which online learning of the P-A mapping may be conducted within a symbolic processing context, so that complex contextual reasoning can influence the P-A mapping. In utilizing a variational calculus approach to define a suitable objective function, the P-A mapping can be treated as an online learning problem via gradient descent using partial derivatives. Our central theoretical result is to demonstrate top-down modulation of low-level perceptual confidences via the Jacobian of the higher levels of a subsumptive P-A hierarchy. Thus, the separation of the Jacobian as a multiplying factor between levels within the objective function naturally enables the integration of abstract symbolic manipulation in the form of fuzzy deductive logic into the P-A mapping learning. We experimentally demonstrate that the resulting framework achieves significantly better accuracy than using P-A learning without top-down modulation. We also demonstrate that it permits novel forms of context-dependent multilevel P-A mapping, applying the mechanism in the context of an intelligent driver assistance system.


Asunto(s)
Lógica Difusa , Aprendizaje Automático , Redes Neurales de la Computación , Algoritmos , Simulación por Computador , Humanos , Percepción
7.
IEEE Trans Pattern Anal Mach Intell ; 35(1): 118-29, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22392708

RESUMEN

We propose a novel method for iterative learning of point correspondences between image sequences. Points moving on surfaces in 3D space are projected into two images. Given a point in either view, the considered problem is to determine the corresponding location in the other view. The geometry and distortions of the projections are unknown, as is the shape of the surface. Given several pairs of point sets but no access to the 3D scene, correspondence mappings can be found by excessive global optimization or by the fundamental matrix if a perspective projective model is assumed. However, an iterative solution on sequences of point-set pairs with general imaging geometry is preferable. We derive such a method that optimizes the mapping based on Neyman's chi-square divergence between the densities representing the uncertainties of the estimated and the actual locations. The densities are represented as channel vectors computed with a basis function approach. The mapping between these vectors is updated with each new pair of images such that fast convergence and high accuracy are achieved. The resulting algorithm runs in real time and is superior to state-of-the-art methods in terms of convergence and accuracy in a number of experiments.


Asunto(s)
Algoritmos , Inteligencia Artificial , Interpretación de Imagen Asistida por Computador/métodos , Imagenología Tridimensional/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Técnica de Sustracción , Sistemas en Línea
8.
IEEE Trans Image Process ; 20(7): 1797-806, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21257379

RESUMEN

In this paper, we present a novel scheme for anisotropic diffusion driven by the image autocorrelation function. We show the equivalence of this scheme to a special case of iterated adaptive filtering. By determining the diffusion tensor field from an autocorrelation estimate, we obtain an evolution equation that is computed from a scalar product of diffusion tensor and the image Hessian. We propose further a set of filters to approximate the Hessian on a minimized spatial support. On standard benchmarks, the resulting method performs favorable in many cases, in particular at low noise levels. In a GPU implementation, video real-time performance is easily achieved.

9.
IEEE Trans Pattern Anal Mach Intell ; 28(2): 209-22, 2006 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16468618

RESUMEN

In this paper, we present a new and efficient method to implement robust smoothing of low-level signal features: B-spline channel smoothing. This method consists of three steps: encoding of the signal features into channels, averaging of the channels, and decoding of the channels. We show that linear smoothing of channels is equivalent to robust smoothing of the signal features if we make use of quadratic B-splines to generate the channels. The linear decoding from B-spline channels allows the derivation of a robust error norm, which is very similar to Tukey's biweight error norm. We compare channel smoothing with three other robust smoothing techniques: nonlinear diffusion, bilateral filtering, and mean-shift filtering, both theoretically and on a 2D orientation-data smoothing task. Channel smoothing is found to be superior in four respects: It has a lower computational complexity, it is easy to implement, it chooses the global minimum error instead of the nearest local minimum, and it can also be used on nonlinear spaces, such as orientation space.


Asunto(s)
Algoritmos , Inteligencia Artificial , Compresión de Datos/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Análisis Numérico Asistido por Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...