Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12550-12561, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37159310

ABSTRACT

Trajectory forecasting for traffic participants (e.g., vehicles) is critical for autonomous platforms to make safe plans. Currently, most trajectory forecasting methods assume that object trajectories have been extracted and directly develop trajectory predictors based on the ground truth trajectories. However, this assumption does not hold in practical situations. Trajectories obtained from object detection and tracking are inevitably noisy, which could cause serious forecasting errors to predictors built on ground truth trajectories. In this paper, we propose to predict trajectories directly based on detection results without relying on explicitly formed trajectories. Different from traditional methods which encode the motion cues of an agent based on its clearly defined trajectory, we extract the motion information only based on the affinity cues among detection results, in which an affinity-aware state update mechanism is designed to manage the state information. In addition, considering that there could be multiple plausible matching candidates, we aggregate the states of them. These designs take the uncertainty of association into account which relax the undesirable effect of noisy trajectory obtained from data association and improve the robustness of the predictor. Extensive experiments validate the effectiveness of our method and its generalization ability to different detectors or forecasting schemes.

2.
IEEE Trans Pattern Anal Mach Intell ; 44(5): 2742-2759, 2022 05.
Article in English | MEDLINE | ID: mdl-33196437

ABSTRACT

In the task of pedestrian trajectory prediction, social interaction could be one of the most complicated factors since it is difficult to be interpreted through simple rules. Recent studies have shown a great ability of LSTM networks in learning social behaviors from datasets, e.g., introducing LSTM hidden states of the neighbors at the last time step into LSTM recursion. However, those methods depend on previous neighboring features which lead to a delayed observation. In this paper, we propose a data-driven states refinement LSTM network (SR-LSTM) to enable the utilization of the current intention of neighbors through a message passing framework. Moreover, the model performs in the form of self-updating by jointly refining the current states of all participants, rather than an input-output mechanism served by feature concatenation. In the process of states refinement, a social-aware information selection module consisting of an element-wise motion gate and a pedestrian-wise attention is designed to serve as the guidance of the message passing process. Considering the pedestrian walking space as a graph where each pedestrian is a node and each pedestrian pair with an edge, spatial-edge LSTMs are further exploited to enhance the model capacity, where two kinds of LSTMs interact with each other so that states of them are interactively refined. Experimental results on four widely used pedestrian trajectory datasets, ETH, UCY, PWPD, and NYGC demonstrate the effectiveness of the proposed model.


Subject(s)
Pedestrians , Algorithms , Humans , Motion , Neural Networks, Computer , Walking
3.
Sensors (Basel) ; 19(23)2019 Nov 21.
Article in English | MEDLINE | ID: mdl-31766458

ABSTRACT

For analyzing the traffic anomaly within dashcam videos from the perspective of ego-vehicles, the agent should spatial-temporally localize the abnormal occasion and regions and give a semantically recounting of what happened. Most existing formulations concentrate on the former spatial-temporal aspect and mainly approach this goal by training normal pattern classifiers/regressors/dictionaries with large-scale availably labeled data. However, anomalies are context-related, and it is difficult to distinguish the margin of abnormal and normal clearly. This paper proposes a progressive unsupervised driving anomaly detection and recounting (D&R) framework. The highlights are three-fold: (1) We formulate driving anomaly D&R as a temporal-spatial-semantic (TSS) model, which achieves a coarse-to-fine focusing and generates convincing driving anomaly D&R. (2) This work contributes an unsupervised D&R without any training data while performing an effective performance. (3) We novelly introduce the traffic saliency, isolation forest, visual semantic causal relations of driving scene to effectively construct the TSS model. Extensive experiments on a driving anomaly dataset with 106 video clips (temporal-spatial-semantically labeled carefully by ourselves) demonstrate superior performance over existing techniques.

4.
Article in English | MEDLINE | ID: mdl-31484119

ABSTRACT

Recurrent neural networks (RNNs) are capable of modeling temporal dependencies of complex sequential data. In general, current available structures of RNNs tend to concentrate on controlling the contributions of current and previous information. However, the exploration of different importance levels of different elements within an input vector is always ignored. We propose a simple yet effective Element-wise-Attention Gate (EleAttG), which can be easily added to an RNN block (e.g. all RNN neurons in an RNN layer), to empower the RNN neurons to have attentiveness capability. For an RNN block, an EleAttG is used for adaptively modulating the input by assigning different levels of importance, i.e., attention, to each element/dimension of the input. We refer to an RNN block equipped with an EleAttG as an EleAtt-RNN block. Instead of modulating the input as a whole, the EleAttG modulates the input at fine granularity, i.e., element-wise, and the modulation is content adaptive. The proposed EleAttG, as an additional fundamental unit, is general and can be applied to any RNN structures, e.g., standard RNN, Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU). We demonstrate the effectiveness of the proposed EleAtt-RNN by applying it to different tasks including the action recognition, from both skeleton-based data and RGB videos, gesture recognition, and sequential MNIST classification. Experiments show that adding attentiveness through EleAttGs to RNN blocks significantly improves the power of RNNs.

5.
IEEE Trans Pattern Anal Mach Intell ; 41(8): 1963-1978, 2019 08.
Article in English | MEDLINE | ID: mdl-30714909

ABSTRACT

Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in action recognition lies in the large variations of action representations when they are captured from different viewpoints. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints over the course of an action in a learning based data driven manner. Instead of re-positioning the skeletons using a fixed human-defined prior criterion, we design two view adaptive neural networks, i.e., VA-RNN and VA-CNN, which are respectively built based on the recurrent neural network (RNN) with the Long Short-term Memory (LSTM) and the convolutional neural network (CNN). For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various views to much more consistent virtual viewpoints. Therefore, the models largely eliminate the influence of the viewpoints, enabling the networks to focus on the learning of action-specific features and thus resulting in superior performance. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the final prediction, obtaining enhanced performance. Moreover, random rotation of skeleton sequences is employed to improve the robustness of view adaptation models and alleviate overfitting during training. Extensive experimental evaluations on five challenging benchmarks demonstrate the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches.

6.
Article in English | MEDLINE | ID: mdl-30296230

ABSTRACT

Embedding and aggregating a set of local descriptors (e.g. SIFT) into a single vector is normally used to represent images in image search. Standard aggregation operations include sum and weighted aggregations. While showing high efficiency, sum aggregation lacks discriminative power. In contrast, weighted aggregation shows promising retrieval performance but suffers extremely high time cost. In this work, we present a general mixed aggregation method that unifies sum and weighted aggregation methods. Owing to its general formulation, our method is able to balance the trade-off between retrieval quality and image representation efficiency. Additionally, to improve query performance, we propose computing multiple weighting coefficients rather than one for each to be aggregated vector by partitioning them into several components with negligible computational cost. Extensive experimental results on standard public image retrieval benchmarks demonstrate that our aggregation method achieves state-of-the-art performance while showing over ten times speedup over baselines.

7.
Sensors (Basel) ; 17(4)2017 Apr 08.
Article in English | MEDLINE | ID: mdl-28397759

ABSTRACT

Depth information has been used in many fields because of its low cost and easy availability, since the Microsoft Kinect was released. However, the Kinect and Kinect-like RGB-D sensors show limited performance in certain applications and place high demands on accuracy and robustness of depth information. In this paper, we propose a depth sensing system that contains a laser projector similar to that used in the Kinect, and two infrared cameras located on both sides of the laser projector, to obtain higher spatial resolution depth information. We apply the block-matching algorithm to estimate the disparity. To improve the spatial resolution, we reduce the size of matching blocks, but smaller matching blocks generate lower matching precision. To address this problem, we combine two matching modes (binocular mode and monocular mode) in the disparity estimation process. Experimental results show that our method can obtain higher spatial resolution depth without loss of the quality of the range image, compared with the Kinect. Furthermore, our algorithm is implemented on a low-cost hardware platform, and the system can support the resolution of 1280 × 960, and up to a speed of 60 frames per second, for depth image sequences.

8.
IEEE Trans Pattern Anal Mach Intell ; 39(10): 2074-2088, 2017 10.
Article in English | MEDLINE | ID: mdl-28113741

ABSTRACT

We present a spatio-temporal energy minimization formulation for simultaneous video object discovery and co-segmentation across multiple videos containing irrelevant frames. Our approach overcomes a limitation that most existing video co-segmentation methods possess, i.e., they perform poorly when dealing with practical videos in which the target objects are not present in many frames. Our formulation incorporates a spatio-temporal auto-context model, which is combined with appearance modeling for superpixel labeling. The superpixel-level labels are propagated to the frame level through a multiple instance boosting algorithm with spatial reasoning, based on which frames containing the target object are identified. Our method only needs to be bootstrapped with the frame-level labels for a few video frames (e.g., usually 1 to 3) to indicate if they contain the target objects or not. Extensive experiments on four datasets validate the efficacy of our proposed method: 1) object segmentation from a single video on the SegTrack dataset, 2) object co-segmentation from multiple videos on a video co-segmentation dataset, and 3) joint object discovery and co-segmentation from multiple videos containing irrelevant frames on the MOViCS dataset and XJTU-Stevens, a new dataset that we introduce in this paper. The proposed method compares favorably with the state-of-the-art in all of these experiments.

9.
Med Phys ; 44(2): 558-569, 2017 Feb.
Article in English | MEDLINE | ID: mdl-27991675

ABSTRACT

PURPOSE: Segmentation of the prostate on MR images has many applications in prostate cancer management. In this work, we propose a supervoxel-based segmentation method for prostate MR images. METHODS: A supervoxel is a set of pixels that have similar intensities, locations, and textures in a 3D image volume. The prostate segmentation problem is considered as assigning a binary label to each supervoxel, which is either the prostate or background. A supervoxel-based energy function with data and smoothness terms is used to model the label. The data term estimates the likelihood of a supervoxel belonging to the prostate by using a supervoxel-based shape feature. The geometric relationship between two neighboring supervoxels is used to build the smoothness term. The 3D graph cut is used to minimize the energy function to get the labels of the supervoxels, which yields the prostate segmentation. A 3D active contour model is then used to get a smooth surface by using the output of the graph cut as an initialization. The performance of the proposed algorithm was evaluated on 30 in-house MR image data and PROMISE12 dataset. RESULTS: The mean Dice similarity coefficients are 87.2 ± 2.3% and 88.2 ± 2.8% for our 30 in-house MR volumes and the PROMISE12 dataset, respectively. The proposed segmentation method yields a satisfactory result for prostate MR images. CONCLUSION: The proposed supervoxel-based method can accurately segment prostate MR images and can have a variety of application in prostate cancer diagnosis and therapy.


Subject(s)
Imaging, Three-Dimensional/methods , Magnetic Resonance Imaging , Prostate/diagnostic imaging , Algorithms , Humans , Male , Prostatic Neoplasms/diagnostic imaging
10.
Sensors (Basel) ; 14(12): 23398-418, 2014 Dec 05.
Article in English | MEDLINE | ID: mdl-25490597

ABSTRACT

Compressive Sensing Imaging (CSI) is a new framework for image acquisition, which enables the simultaneous acquisition and compression of a scene. Since the characteristics of Compressive Sensing (CS) acquisition are very different from traditional image acquisition, the general image compression solution may not work well. In this paper, we propose an efficient lossy compression solution for CS acquisition of images by considering the distinctive features of the CSI. First, we design an adaptive compressive sensing acquisition method for images according to the sampling rate, which could achieve better CS reconstruction quality for the acquired image. Second, we develop a universal quantization for the obtained CS measurements from CS acquisition without knowing any a priori information about the captured image. Finally, we apply these two methods in the CSI system for efficient lossy compression of CS acquisition. Simulation results demonstrate that the proposed solution improves the rate-distortion performance by 0.4~2 dB comparing with current state-of-the-art, while maintaining a low computational complexity.

11.
IEEE Trans Image Process ; 23(9): 4070-4086, 2014 09.
Article in English | MEDLINE | ID: mdl-25051553

ABSTRACT

The segmentation of categorized objects addresses the problem of joint segmentation of a single category of object across a collection of images, where categorized objects are referred to objects in the same category. Most existing methods of segmentation of categorized objects made the assumption that all images in the given image collection contain the target object. In other words, the given image collection is noise free. Therefore, they may not work well when there are some noisy images which are not in the same category, such as those image collections gathered by a text query from modern image search engines. To overcome this limitation, we propose a method for automatic segmentation and recognition of categorized objects from noisy Web image collections. This is achieved by cotraining an automatic object segmentation algorithm that operates directly on a collection of images, and an object category recognition algorithm that identifies which images contain the target object. The object segmentation algorithm is trained on a subset of images from the given image collection which are recognized to contain the target object with high confidence, while training the object category recognition model is guided by the intermediate segmentation results obtained from the object segmentation algorithm. This way, our co-training algorithm automatically identifies the set of true positives in the noisy Web image collection, and simultaneously extracts the target objects from all the identified images. Extensive experiments validated the efficacy of our proposed approach on four datasets: 1) the Weizmann horse dataset, 2) the MSRC object category dataset, 3) the iCoseg dataset, and 4) a new 30-categories dataset including 15,634 Web images with both hand-annotated category labels and ground truth segmentation labels. It is shown that our method compares favorably with the state-of-the-art, and has the ability to deal with noisy image collections.

12.
Sensors (Basel) ; 13(3): 3409-31, 2013 Mar 12.
Article in English | MEDLINE | ID: mdl-23482090

ABSTRACT

Salient object perception is the process of sensing the salient information from the spatio-temporal visual scenes, which is a rapid pre-attention mechanism for the target location in a visual smart sensor. In recent decades, many successful models of visual saliency perception have been proposed to simulate the pre-attention behavior. Since most of the methods usually need some ad hoc parameters or high-cost preprocessing, they are difficult to rapidly detect salient object or be implemented by computing parallelism in a smart sensor. In this paper, we propose a novel spatio-temporal saliency perception method based on spatio-temporal hypercomplex spectral contrast (HSC). Firstly, the proposed HSC algorithm represent the features in the HSV (hue, saturation and value) color space and features of motion by a hypercomplex number. Secondly, the spatio-temporal salient objects are efficiently detected by hypercomplex Fourier spectral contrast in parallel. Finally, our saliency perception model also incorporates with the non-uniform sampling, which is a common phenomenon of human vision that directs visual attention to the logarithmic center of the image/video in natural scenes. The experimental results on the public saliency perception datasets demonstrate the effectiveness of the proposed approach compared to eleven state-of-the-art approaches. In addition, we extend the proposed model to moving object extraction in dynamic scenes, and the proposed algorithm is superior to the traditional algorithms.


Subject(s)
Models, Theoretical , Time Perception/physiology , Visual Perception/physiology , Algorithms , Attention/physiology , Color , Humans , Video Recording
13.
Opt Lett ; 37(17): 3609-11, 2012 Sep 01.
Article in English | MEDLINE | ID: mdl-22940965

ABSTRACT

This Letter proposes a novel saliency detection method based on biological plausibility of a hypercomplex Fourier spectrum contrast algorithm. The proposed algorithm takes into consideration not only simulation of simple cortical cells in the receptive field of humans but also the texture-color feature global spectrum contrast of an image. First, we utilize log-Gabor filters to mimic simple cortical cells in the receptive field of humans. Two complex numbers of texture colors are acquired through feature maps in hue, saturation, and value color space by log-Gabor. Second, we build the hypercomplex number using these representations of feature maps. Finally, the salient object is detected by spectrum contrast in the hypercomplex Fourier domain. Experimental results show that the proposed algorithm outperforms the state-of-the-art methods.


Subject(s)
Fourier Analysis , Image Processing, Computer-Assisted/methods , Algorithms , Spectrum Analysis
14.
IEEE Trans Image Process ; 21(1): 386-92, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21693420

ABSTRACT

This paper proposes an efficient method to estimate the point spread function (PSF) of a blurred image using image gradients spatial correlation. A patch-based image degradation model is proposed for estimating the sample covariance matrix of the gradient domain natural image. Based on the fact that the gradients of clean natural images are approximately uncorrelated to each other, we estimated the autocorrelation function of the PSF from the covariance matrix of gradient domain blurred image using the proposed patch-based image degradation model. The PSF is computed using a phase retrieval technique to remove the ambiguity introduced by the absence of the phase. Experimental results show that the proposed method significantly reduces the computational burden in PSF estimation, compared with existing methods, while giving comparable blurring kernel.


Subject(s)
Algorithms , Artifacts , Data Interpretation, Statistical , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Computer Simulation , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity , Statistics as Topic
15.
IEEE Trans Image Process ; 20(4): 1177-84, 2011 Apr.
Article in English | MEDLINE | ID: mdl-20858580

ABSTRACT

The JPEG2000 system provides scalability with respect to quality, resolution and color component in the transfer of images. However, scalability with respect to semantic content is still lacking. We propose a biologically plausible salient region based bit allocation mechanism within the JPEG2000 codec for the purpose of augmenting scalability with respect to semantic content. First, an input image is segmented into several salient proto-objects (a region that possibly contains a semantically meaningful physical object) and background regions (a region that contains no object of interest) by modeling visual focus of attention on salient proto-objects. Then, a novel rate control scheme distributes a target bit rate to each individual region according to its saliency, and constructs quality layers of proto-objects for the purpose of more precise truncation comparable to original quality layers in the standard. Empirical results show that the suggested approach adds to the JPEG2000 system scalability with respect to content as well as the functionality of selectively encoding, decoding, and manipulation of each individual proto-object in the image, with only some slightly trivial modifications to the JPEG2000 standard. Furthermore, the proposed rate control approach efficiently reduces the computational complexity and memory usage, as well as maintains the high quality of the image to a level comparable to the conventional post-compression rate distortion (PCRD) optimum truncation algorithm for JPEG2000.


Subject(s)
Algorithms , Computer Communication Networks , Computer Graphics , Data Compression/methods , Multimedia , Signal Processing, Computer-Assisted , Video Recording/methods , Image Enhancement/methods , Reproducibility of Results , Sensitivity and Specificity
16.
IEEE Trans Syst Man Cybern B Cybern ; 38(1): 196-209, 2008 Feb.
Article in English | MEDLINE | ID: mdl-18270091

ABSTRACT

Multiple-target tracking in video (MTTV) presents a technical challenge in video surveillance applications. In this paper, we formulate the MTTV problem using dynamic Markov network (DMN) techniques. Our model consists of three coupled Markov random fields: 1) a field for the joint state of the multitarget; 2) a binary random process for the existence of each individual target; and 3) a binary random process for the occlusion of each dual adjacent target. To make the inference tractable, we introduce two robust functions that eliminate the two binary processes. We then propose a novel belief propagation (BP) algorithm called particle-based BP and embed it into a Markov chain Monte Carlo approach to obtain the maximum a posteriori estimation in the DMN. With a stratified sampler, we incorporate the information obtained from a learned bottom-up detector (e.g., support-vector-machine-based classifier) and the motion model of the target into the message propagation. Other low-level visual cues such as motion and shape can be easily incorporated into our framework to obtain better tracking results. We have performed extensive experimental verification, and the results suggest that our method is comparable to the state-of-art multitarget tracking methods in all the cases we tested.


Subject(s)
Algorithms , Artificial Intelligence , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Pattern Recognition, Automated/methods , Image Enhancement/methods , Monte Carlo Method , Motion , Reproducibility of Results , Sensitivity and Specificity
SELECTION OF CITATIONS
SEARCH DETAIL
...