Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9822-9835, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34752380

RESUMO

Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA enhance the vanilla GRU to focus on small objects and better align moving objects. Our overall framework supports both online and offline video object detection in point clouds. We implement our algorithm based on prevalent anchor-based and anchor-free detectors. Evaluation results on the challenging nuScenes benchmark show superior performance of our method, achieving first on the leaderboard (at the time of paper submission) without any "bells and whistles." Our source code is available at https://github.com/shenjianbing/GMP3D.


Assuntos
Algoritmos , Redes Neurais de Computação , Benchmarking , Sinais (Psicologia) , Movimento (Física)
2.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 444-459, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35157576

RESUMO

Video anomaly detection (VAD) has been extensively studied for static cameras but is much more challenging in egocentric driving videos where the scenes are extremely dynamic. This paper proposes an unsupervised method for traffic VAD based on future object localization. The idea is to predict future locations of traffic participants over a short horizon, and then monitor the accuracy and consistency of these predictions as evidence of an anomaly. Inconsistent predictions tend to indicate an anomaly has occurred or is about to occur. To evaluate our method, we introduce a new large-scale benchmark dataset called Detection of Traffic Anomaly (DoTA)containing 4,677 videos with temporal, spatial, and categorical annotations. We also propose a new VAD evaluation metric, called spatial-temporal area under curve (STAUC), and show that it captures how well a model detects both temporal and spatial locations of anomalies unlike existing metrics that focus only on temporal localization. Experimental results show our method outperforms state-of-the-art methods on DoTA in terms of both metrics. We offer rich categorical annotations in DoTA to benchmark video action detection and online action detection methods. The DoTA dataset has been made available at: https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7099-7122, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36449595

RESUMO

Video segmentation-partitioning video frames into multiple segments or objects-plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research - generic object segmentation (of unknown categories) in videos, and video semantic segmentation - by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7885-7897, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34582345

RESUMO

In this article, we model a set of pixelwise object segmentation tasks - automatic video segmentation (AVS), image co-segmentation (ICS) and few-shot semantic segmentation (FSS) - in a unified view of segmenting objects from relational visual data. To this end, we propose an attentive graph neural network (AGNN) that addresses these tasks in a holistic fashion, by formulating them as a process of iterative information fusion over data graphs. It builds a fully-connected graph to efficiently represent visual data as nodes and relations between data instances as edges. The underlying relations are described by a differentiable attention mechanism, which thoroughly examines fine-grained semantic similarities between all the possible location pairs in two data instances. Through parametric message passing, AGNN is able to capture knowledge from the relational visual data, enabling more accurate object discovery and segmentation. Experiments show that AGNN can automatically highlight primary foreground objects from video sequences (i.e., automatic video segmentation), and extract common objects from noisy collections of semantically related images (i.e., image co-segmentation). AGNN can even generalize segment new categories with little annotated data (i.e., few-shot semantic segmentation). Taken together, our results demonstrate that AGNN provides a powerful tool that is applicable to a wide range of pixel-wise object pattern understanding tasks with relational visual data. Our algorithm implementations have been made publicly available at https://github.com/carrierlxk/AGNN.


Assuntos
Algoritmos , Redes Neurais de Computação
5.
ACS Nano ; 15(2): 2901-2910, 2021 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-33559464

RESUMO

Counterfeit goods create significant economic losses and product failures in many industries. Here, we report a covert anticounterfeit platform where plasmonic nanoparticles (NPs) create physically unclonable functions (PUFs) with high encoding capacity. By allowing anisotropic Au NPs of different sizes to deposit randomly, a diversity of surfaces can be facilely tagged with NP deposits that serve as PUFs and are analyzed using optical microscopy. High encoding capacity is engineered into the tags by the sizes of the Au NPs, which provide a range of color responses, while their anisotropy provides sensitivity to light polarization. An estimated encoding capacity of 270n is achieved, which is one of the highest reported to date. Authentication of the tags with deep machine learning allows for high accuracy and rapid matching of a tag to a specific product. Moreover, the tags contain descriptive metadata that is leveraged to match a tag to a specific lot number (i.e., a collection of tags created in the same manner from the same formulation of anisotropic Au NPs). Overall, integration of designer plasmonic NPs with deep machine learning methods can create a rapidly authenticated anticounterfeit platform with high encoding capacity.

6.
J Vis Exp ; (140)2018 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-30346402

RESUMO

Infants and toddlers view the world, at a basic sensory level, in a fundamentally different way from their parents. This is largely due to biological constraints: infants possess different body proportions than their parents and the ability to control their own head movements is less developed. Such constraints limit the visual input available. This protocol aims to provide guiding principles for researchers using head-mounted cameras to understand the changing visual input experienced by the developing infant. Successful use of this protocol will allow researchers to design and execute studies of the developing child's visual environment set in the home or laboratory. From this method, researchers can compile an aggregate view of all the possible items in a child's field of view. This method does not directly measure exactly what the child is looking at. By combining this approach with machine learning, computer vision algorithms, and hand-coding, researchers can produce a high-density dataset to illustrate the changing visual ecology of the developing infant.


Assuntos
Desenvolvimento Infantil , Gravação em Vídeo/instrumentação , Gravação em Vídeo/métodos , Visão Ocular/fisiologia , Pré-Escolar , Feminino , Mãos/fisiologia , Humanos , Lactente , Masculino , Percepção Visual/fisiologia
7.
Proc ACM Int Conf Multimodal Interact ; 2015: 351-354, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28966999

RESUMO

Wearable devices are becoming part of everyday life, from first-person cameras (GoPro, Google Glass), to smart watches (Apple Watch), to activity trackers (FitBit). These devices are often equipped with advanced sensors that gather data about the wearer and the environment. These sensors enable new ways of recognizing and analyzing the wearer's everyday personal activities, which could be used for intelligent human-computer interfaces and other applications. We explore one possible application by investigating how egocentric video data collected from head-mounted cameras can be used to recognize social activities between two interacting partners (e.g. playing chess or cards). In particular, we demonstrate that just the positions and poses of hands within the first-person view are highly informative for activity recognition, and present a computer vision approach that detects hands to automatically estimate activities. While hand pose detection is imperfect, we show that combining evidence across first-person views from the two social partners significantly improves activity recognition accuracy. This result highlights how integrating weak but complimentary sources of evidence from social partners engaged in the same task can help to recognize the nature of their interaction.

8.
Proc IEEE Int Conf Comput Vis ; 2015: 1949-1957, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29225555

RESUMO

Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Convolutional Neural Networks, and introduce a simple candidate region generation approach that outperforms existing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an application, we investigate the extent to which hand segmentation alone can distinguish between different activities. We evaluate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.

9.
IEEE Trans Pattern Anal Mach Intell ; 35(12): 2841-53, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24136425

RESUMO

Recent work in structure from motion (SfM) has built 3D models from large collections of images downloaded from the Internet. Many approaches to this problem use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the image collection grows, and can suffer from drift or local minima. We present an alternative framework for SfM based on finding a coarse initial solution using hybrid discrete-continuous optimization and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random field (MRF) formulation, coupled with a continuous Levenberg-Marquardt refinement. The formulation naturally incorporates various sources of information about both the cameras and points, including noisy geotags and vanishing point (VP) estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it produces models that are similar to or better than those produced by incremental bundle adjustment, but more robustly and in a fraction of the time.

10.
Proc Natl Acad Sci U S A ; 107(52): 22436-41, 2010 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-21148099

RESUMO

We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals' physical locations over time.


Assuntos
Comunicação , Simulação por Computador , Comportamento Social , Algoritmos , Humanos , Modelos Teóricos , Probabilidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA