Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38917283

RESUMEN

Object pose estimation constitutes a critical area within the domain of 3D vision. While contemporary state-of-the-art methods that leverage real-world pose annotations have demonstrated commendable performance, the procurement of such real training data incurs substantial costs. This paper focuses on a specific setting wherein only 3D CAD models are utilized as a priori knowledge, devoid of any background or clutter information. We introduce a novel method, CPPF++, designed for sim-to-real category-level pose estimation. This method builds upon the foundational point-pair voting scheme of CPPF, reformulating it through a probabilistic view. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty by estimating the probabilistic distribution of each point pair within the canonical space. Furthermore, we augment the contextual information provided by each voting unit through the introduction of N-point tuples. To enhance the robustness and accuracy of the model, we incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a tuple feature ensemble. Alongside these methodological advancements, we introduce a new category-level pose estimation dataset, named DiversePose 300. Empirical evidence demonstrates that our method significantly surpasses previous sim-to-real approaches and achieves comparable or superior performance on novel datasets. Our code is available on https://github.com/qq456cvb/CPPF2.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5645-5662, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38517729

RESUMEN

Estimating and synthesizing the hand's manipulation of objects is central to understanding human behaviour. To accurately model the interaction between the hand and object (referred to as the "hand-object"), we must not only focus on the pose of the hand and object, but also consider the contact between them. This contact provides valuable information for generating semantically and physically plausible grasps. In this paper, we propose an explicit contact representation called Contact Potential Field (CPF). In CPF, we model the contact between a pair of hand-object vertices as a spring-mass system. This system encodes the distance of the pair, as well as a likelihood of that contact being stable. Therefore, the system of multiple extended and compressed springs forms an elastic potential field with minimal energy at the optimal grasp position. We apply CPF to two relevant tasks, namely, hand-object pose estimation and grasping pose generation. Extensive experiments on the two challenging tasks and three commonly used datasets have demonstrated that our method can achieve state-of-the-art in several reconstruction metrics, allowing us to produce more physically plausible hand-object poses even when the ground-truth exhibits severe interpenetration or disjointedness.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8494-8506, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37819797

RESUMEN

Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis. Although there have been advances with deep learning, it remains challenging. The object recognition-like solutions usually try to map pixels to semantics directly, but activity patterns are much different from object patterns, thus hindering another success. In this article, we propose a novel paradigm to reformulate this task in two-stage: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics. To afford a representative primitive space, we build a knowledge base including 26+ M primitive labels and logic rules from human priors or automatic discovering. Our framework, Human Activity Knowledge Engine (HAKE), exhibits superior generalization ability and performance upon canonical methods on challenging benchmarks. Code and data are available at http://hake-mvig.cn/.


Asunto(s)
Inteligencia Artificial , Gadiformes , Humanos , Animales , Algoritmos , Bases del Conocimiento , Actividades Humanas
4.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15051-15064, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37721890

RESUMEN

Predicting future trajectories of dynamic agents is inherently riddled with uncertainty. Given a certain historical observation, there are multiple plausible future movements people can perform. Notably, these possible movements are usually centralized around a few representative motion patterns, e.g. acceleration, deceleration, turning, etc. In this paper, we propose a novel prediction scheme which explores human behavior modality representations from real-world trajectory data to discover such motion patterns and further uses them to aid in trajectory prediction. To explore potential behavior modalities, we introduce a deep feature clustering process on trajectory features and each cluster can represent a type of modality. Intuitively, each modality is naturally a class, and a classification network can be adopted to retrieve highly probable modalities about to happen in the future according to historical observations. On account of a wide variety of cues existing in the observation (e.g. agents' motion states, semantics of the scene, etc.), we further design a gated aggregation module to fuse different types of cues into a unified feature. Finally, an adaptation process is proposed to adapt a certain modality to specific historical observations and generate fine-grained prediction results. Extensive experiments on four widely-used benchmarks show the superiority of our proposed approach.

5.
Front Behav Neurosci ; 17: 1111908, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37324523

RESUMEN

Computer vision has emerged as a powerful tool to elevate behavioral research. This protocol describes a computer vision machine learning pipeline called AlphaTracker, which has minimal hardware requirements and produces reliable tracking of multiple unmarked animals, as well as behavioral clustering. AlphaTracker pairs a top-down pose-estimation software combined with unsupervised clustering to facilitate behavioral motif discovery that will accelerate behavioral research. All steps of the protocol are provided as open-source software with graphic user interfaces or implementable with command-line prompts. Users with a graphical processing unit (GPU) can model and analyze animal behaviors of interest in less than a day. AlphaTracker greatly facilitates the analysis of the mechanism of individual/social behavior and group dynamics.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7157-7173, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37145952

RESUMEN

Accurate whole-body multi-person pose estimation and tracking is an important yet challenging topic in computer vision. To capture the subtle actions of humans for complex behavior analysis, whole-body pose estimation including the face, body, hand and foot is essential over conventional body-only pose estimation. In this article, we present AlphaPose, a system that can perform accurate whole-body pose estimation and tracking jointly while running in realtime. To this end, we propose several new techniques: Symmetric Integral Keypoint Regression (SIKR) for fast and fine localization, Parametric Pose Non-Maximum-Suppression (P-NMS) for eliminating redundant human detections and Pose Aware Identity Embedding for jointly pose estimation and tracking. During training, we resort to Part-Guided Proposal Generator (PGPG) and multi-domain knowledge distillation to further improve the accuracy. Our method is able to localize whole-body keypoints accurately and tracks humans simultaneously given inaccurate bounding boxes and redundant detections. We show a significant improvement over current state-of-the-art methods in both speed and accuracy on COCO-wholebody, COCO, PoseTrack, and our proposed Halpe-FullBody pose estimation dataset. Our model, source codes and dataset are made publicly available at https://github.com/MVIG-SJTU/AlphaPose.


Asunto(s)
Algoritmos , Postura , Humanos
7.
Nature ; 603(7902): 667-671, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35296862

RESUMEN

Most social species self-organize into dominance hierarchies1,2, which decreases aggression and conserves energy3,4, but it is not clear how individuals know their social rank. We have only begun to learn how the brain represents social rank5-9 and guides behaviour on the basis of this representation. The medial prefrontal cortex (mPFC) is involved in social dominance in rodents7,8 and humans10,11. Yet, precisely how the mPFC encodes relative social rank and which circuits mediate this computation is not known. We developed a social competition assay in which mice compete for rewards, as well as a computer vision tool (AlphaTracker) to track multiple, unmarked animals. A hidden Markov model combined with generalized linear models was able to decode social competition behaviour from mPFC ensemble activity. Population dynamics in the mPFC predicted social rank and competitive success. Finally, we demonstrate that mPFC cells that project to the lateral hypothalamus promote dominance behaviour during reward competition. Thus, we reveal a cortico-hypothalamic circuit by which the mPFC exerts top-down modulation of social dominance.


Asunto(s)
Hipotálamo , Corteza Prefrontal , Animales , Área Hipotalámica Lateral , Ratones , Recompensa , Conducta Social
8.
IEEE Trans Image Process ; 31: 1072-1083, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34986097

RESUMEN

Human life is populated with articulated objects. Current Category-level Articulation Pose Estimation (CAPE) methods are studied under the single-instance setting with a fixed kinematic structure for each category. Considering these limitations, we aim to study the problem of estimating part-level 6D pose for multiple articulated objects with unknown kinematic structures in a single RGB-D image, and reform this problem setting for real-world environments and suggest a CAPE-Real (CAPER) task setting. This setting allows varied kinematic structures within a semantic category, and multiple instances to co-exist in an observation of real world. To support this task, we build an articulated model repository ReArt-48 and present an efficient dataset generation pipeline, which contains Fast Articulated Object Modeling (FAOM) and Semi-Authentic MixEd Reality Technique (SAMERT). Accompanying the pipeline, we build a large-scale mixed reality dataset ReArtMix and a real world dataset ReArtVal. Accompanying the CAPER problem and the dataset, we propose an effective framework that exploits RGB-D input to estimate part-level pose for multiple instances in a single forward pass. In our method, we introduce object detection from RGB-D input to handle the multi-instance problem and segment each instance into several parts. To address the unknown kinematic structure issue, we propose an Articulation Parsing Network to analyze the structure of detected instance, and also build a Pair Articulation Pose Estimation module to estimate per-part 6D pose as well as joint property from connected part pairs. Extensive experiments demonstrate that the proposed method can achieve good performance on CAPER, CAPE and instance-level Robot Arm pose estimation problems. We believe it could serve as a strong baseline for future research on the CAPER task. The datasets and codes in our work will be made publicly available.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3870-3882, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33493110

RESUMEN

Human-object interaction (HOI) Detection is an important problem to understand how humans interact with objects. In this paper, we explore Interactiveness Knowledge which indicates whether human and object interact with each other or not. We found that interactiveness knowledge can be learned across HOI datasets and alleviate the gap between diverse HOI category settings. Our core idea is to exploit an Interactiveness Network to learn the general interactiveness knowledge from multiple HOI datasets and perform Non-Interaction Suppression before HOI classification in inference. On account of the generalization of interactiveness, interactiveness network is a transferable knowledge learner and can be cooperated with any HOI detection models to achieve desirable results. We utilize the human instance and body part features together to learn the interactiveness in hierarchical paradigm, i.e., instance-level and body part-level interactivenesses. Thereafter, a consistency task is proposed to guide the learning and extract deeper interactive visual clues. We extensively evaluate the proposed method on HICO-DET, V-COCO, and a newly constructed HAKE-HOI dataset. With the learned interactiveness, our method outperforms state-of-the-art HOI detection methods, verifying its efficacy and flexibility. Code is available at https://github.com/DirtyHarryLYL/Transferable-Interactiveness-Network.


Asunto(s)
Algoritmos , Aprendizaje , Humanos
10.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5780-5795, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33848241

RESUMEN

Pixel-level 2D object semantic understanding is an important topic in computer vision and could help machine deeply understand objects (e.g., functionality and affordance) in our daily life. However, most previous methods directly train on correspondences in 2D images, which is end-to-end but loses plenty of information in 3D spaces. In this paper, we propose a new method on predicting image corresponding semantics in 3D domain and then projecting them back onto 2D images to achieve pixel-level understanding. In order to obtain reliable 3D semantic labels that are absent in current image datasets, we build a large scale keypoint knowledge engine called KeypointNet, which contains 103,450 keypoints and 8,234 3D models from 16 object categories. Our method leverages the advantages in 3D vision and can explicitly reason about objects self-occlusion and visibility. We show that our method gives comparative and even superior results on standard semantic benchmarks.

11.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9489-9502, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34822324

RESUMEN

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point. In addition, we extend PRIN to a sparse version called SPRIN, which directly operates on sparse point clouds. Both PRIN and SPRIN can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. Results show that, on the dataset with randomly rotated point clouds, SPRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide thorough theoretical proof and analysis for point-wise rotation invariance achieved by our methods. The code to reproduce our results will be made publicly available.

12.
Artículo en Inglés | MEDLINE | ID: mdl-34637379

RESUMEN

Attributes and objects can compose diverse compositions. To model the compositional nature of these concepts, it is a good choice to learn them as transformations, e.g., coupling and decoupling. However, complex transformations need to satisfy specific principles to guarantee rationality. Here, we first propose a previously ignored principle of attribute-object transformation: Symmetry. For example, coupling peeled-apple with attribute peeled should result in peeled-apple, and decoupling peeled from apple should still output apple. Incorporating the symmetry, we propose a transformation framework inspired by group theory, i.e., SymNet. It consists of two modules: Coupling Network and Decoupling Network. We adopt deep neural networks to implement SymNet and train it in an end-to-end paradigm with the group axioms and symmetry as objectives. Then, we propose a Relative Moving Distance (RMD) based method to utilize the attribute change instead of the attribute pattern itself to classify attributes. Besides the compositions of single-attribute and object, our RMD is also suitable for complex compositions of multiple attributes and objects when incorporating attribute correlations. SymNet can be utilized for attribute learning, compositional zero-shot learning and outperforms the state-of-the-art on four widely-used benchmarks. Code is at https://github.com/DirtyHarryLYL/SymNet.

13.
Artículo en Inglés | MEDLINE | ID: mdl-32142430

RESUMEN

Indoor semantic segmentation with RGBD input has received decent progress recently, but studies on instance-level objects in outdoor scenarios meet challenges due to the ambiguity in the acquired outdoor depth map. To tackle this problem, we proposed a residual regretting mechanism, incorporated into current flexible, general and solid instance segmentation framework Mask R-CNN in an end-to-end manner. Specifically, regretting cascade is designed to gradually refine and fully unearth useful information in depth maps, acting in a filtering and backup way. Additionally, embedded by a novel residual connection structure, the regretting module combines RGB and depth branches with pixel-level mask robustly. Extensive experiments on the challenging Cityscapes and KITTI dataset manifest the effectiveness of our residual regretting scheme for handling outdoor depth map. Our approach achieves state-of-the-art performance on RGBD instance segmentation, with 13.4% relative improvement over Mask R-CNN on Cityscapes by depth cue.

14.
IEEE Trans Image Process ; 28(1): 45-55, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30028702

RESUMEN

We propose a deep learning approach for directly estimating relative atmospheric visibility from outdoor photos without relying on weather images or data that require expensive sensing or custom capture. Our data-driven approach capitalizes on a large collection of Internet images to learn rich scene and visibility varieties. The relative CNN-RNN coarse-to-fine model, where CNN stands for convolutional neural network and RNN stands for recurrent neural network, exploits the joint power of relative support vector machine, which has a good ranking representation, and the data-driven deep learning features derived from our novel CNN-RNN model. The CNN-RNN model makes use of shortcut connections to bridge a CNN module and an RNN coarse-to-fine module. The CNN captures the global view while the RNN simulates human's attention shift, namely, from the whole image (global) to the farthest discerned region (local). The learned relative model can be adapted to predict absolute visibility in limited scenarios. Extensive experiments and comparisons are performed to verify our method. We have built an annotated dataset consisting of about 40000 images with 0.2 million human annotations. The large-scale, annotated visibility data set will be made available to accompany this paper.

15.
IEEE Trans Vis Comput Graph ; 24(6): 2051-2063, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-28489537

RESUMEN

We present a real-time video stylization system and demonstrate a variety of painterly styles rendered on real video inputs. The key technical contribution lies on the object flow, which is robust to inaccurate optical flow, unknown object transformation and partial occlusion as well. Since object flows relate regions of the same object across frames, shower-door effect can be effectively reduced where painterly strokes and textures are rendered on video objects. The construction of object flows is performed in real time and automatically after applying metric learning. To reduce temporal flickering, we extend the bilateral filtering into motion bilateral filtering. We propose quantitative metrics to measure the temporal coherence on structures and textures of our stylized videos, and perform extensive experiments to compare our stylized results with baseline systems and prior works specializing in watercolor and abstraction.

16.
IEEE Trans Image Process ; 26(9): 4154-4167, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28436867

RESUMEN

Toward weather condition recognition, we emphasize the importance of regional cues in this paper and address a few important problems regarding appropriate representation, its differentiation among regions, and weather-condition feature construction. Our major contribution is, first, to construct a multi-class benchmark data set containing 65 000 images from six common categories for sunny, cloudy, rainy, snowy, haze, and thunder weather. This data set also benefits weather classification and attribute recognition. Second, we propose a deep learning framework named region selection and concurrency model (RSCM) to help discover regional properties and concurrency. We evaluate RSCM on our multi-class benchmark data and another public data set for weather recognition.

17.
IEEE Trans Pattern Anal Mach Intell ; 39(12): 2510-2524, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28113309

RESUMEN

Given a single outdoor image, we propose a collaborative learning approach using novel weather features to label the image as either sunny or cloudy. Though limited, this two-class classification problem is by no means trivial given the great variety of outdoor images captured by different cameras where the images may have been edited after capture. Our overall weather feature combines the data-driven convolutional neural network (CNN) feature and well-chosen weather-specific features. They work collaboratively within a unified optimization framework that is aware of the presence (or absence) of a given weather cue during learning and classification. In this paper we propose a new data augmentation scheme to substantially enrich the training data, which is used to train a latent SVM framework to make our solution insensitive to global intensity transfer. Extensive experiments are performed to verify our method. Compared with our previous work and the sole use of a CNN classifier, this paper improves the accuracy up to 7-8 percent. Our weather image dataset is available together with the executable of our classifier.

18.
IEEE Trans Image Process ; 24(12): 5789-99, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26452286

RESUMEN

People know and care for personal objects, which can be different for individuals. Automatically discovering personal objects is thus of great practical importance. We, in this paper, pursue this task with wearable cameras based on the common sense that personal objects generally company us in various scenes. With this clue, we exploit a new object-scene distribution for robust detection. Two technical challenges involved in estimating this distribution, i.e., scene extraction and unsupervised object discovery, are tackled. For scene extraction, we learn the latent representation instead of simply selecting a few frames from the videos. In object discovery, we build an interaction model to select frame-level objects and use nonparametric Bayesian clustering. Experiments verify the usefulness of our approach.


Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Grabación en Video/métodos , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Humanos , Telecomunicaciones
19.
IEEE Trans Image Process ; 23(2): 837-47, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24184722

RESUMEN

Dictionary learning has been widely used in many image processing tasks. In most of these methods, the number of basis vectors is either set by experience or coarsely evaluated empirically. In this paper, we propose a new scale adaptive dictionary learning framework, which jointly estimates suitable scales and corresponding atoms in an adaptive fashion according to the training data, without the need of prior information. We design an atom counting function and develop a reliable numerical scheme to solve the challenging optimization problem. Extensive experiments on texture and video data sets demonstrate quantitatively and visually that our method can estimate the scale, without damaging the sparse reconstruction ability.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...