Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 36
Filter
1.
Article in English | MEDLINE | ID: mdl-38917283

ABSTRACT

Object pose estimation constitutes a critical area within the domain of 3D vision. While contemporary state-of-the-art methods that leverage real-world pose annotations have demonstrated commendable performance, the procurement of such real training data incurs substantial costs. This paper focuses on a specific setting wherein only 3D CAD models are utilized as a priori knowledge, devoid of any background or clutter information. We introduce a novel method, CPPF++, designed for sim-to-real category-level pose estimation. This method builds upon the foundational point-pair voting scheme of CPPF, reformulating it through a probabilistic view. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty by estimating the probabilistic distribution of each point pair within the canonical space. Furthermore, we augment the contextual information provided by each voting unit through the introduction of N-point tuples. To enhance the robustness and accuracy of the model, we incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a tuple feature ensemble. Alongside these methodological advancements, we introduce a new category-level pose estimation dataset, named DiversePose 300. Empirical evidence demonstrates that our method significantly surpasses previous sim-to-real approaches and achieves comparable or superior performance on novel datasets. Our code is available on https://github.com/qq456cvb/CPPF2.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5430-5448, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38412088

ABSTRACT

Source-free domain adaptation (SFDA) shows the potential to improve the generalizability of deep learning-based face anti-spoofing (FAS) while preserving the privacy and security of sensitive human faces. However, existing SFDA methods are significantly degraded without accessing source data due to the inability to mitigate domain and identity bias in FAS. In this paper, we propose a novel Source-free Domain Adaptation framework for FAS (SDA-FAS) that systematically addresses the challenges of source model pre-training, source knowledge adaptation, and target data exploration under the source-free setting. Specifically, we develop a generalized method for source model pre-training that leverages a causality-inspired PatchMix data augmentation to diminish domain bias and designs the patch-wise contrastive loss to alleviate identity bias. For source knowledge adaptation, we propose a contrastive domain alignment module to align conditional distribution across domains with a theoretical equivalence to adaptation based on source data. Furthermore, target data exploration is achieved via self-supervised learning with patch shuffle augmentation to identify unseen attack types, which is ignored in existing SFDA methods. To our best knowledge, this paper provides the first full-stack privacy-preserving framework to address the generalization problem in FAS. Extensive experiments on nineteen cross-dataset scenarios show our framework considerably outperforms state-of-the-art methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 975-993, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37903055

ABSTRACT

3-D point clouds facilitate 3-D visual applications with detailed information of objects and scenes but bring about enormous challenges to design efficient compression technologies. The irregular signal statistics and high-order geometric structures of 3-D point clouds cannot be fully exploited by existing sparse representation and deep learning based point cloud attribute compression schemes and graph dictionary learning paradigms. In this paper, we propose a novel p-Laplacian embedding graph dictionary learning framework that jointly exploits the varying signal statistics and high-order geometric structures for 3-D point cloud attribute compression. The proposed framework formulates a nonconvex minimization constrained by p-Laplacian embedding regularization to learn a graph dictionary varying smoothly along the high-order geometric structures. An efficient alternating optimization paradigm is developed by harnessing ADMM to solve the nonconvex minimization. To our best knowledge, this paper proposes the first graph dictionary learning framework for point cloud compression. Furthermore, we devise an efficient layered compression scheme that integrates the proposed framework to exploit the correlations of 3-D point clouds in a structured fashion. Experimental results demonstrate that the proposed framework is superior to state-of-the-art transform-based methods in M-term approximation and point cloud attribute compression and outperforms recent MPEG G-PCC reference software.

4.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 1031-1048, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37930910

ABSTRACT

By introducing randomness on the environments, domain randomization (DR) imposes diversity to the policy training of deep reinforcement learning, and thus improves its capability of generalization. The randomization of environments, however, introduces another source of variability for the estimate of policy gradients, in addition to the already high variance incurred by trajectory sampling. Therefore, with standard state-dependent baselines, the policy gradient methods may still suffer high variance, causing a low sample efficiency during the training of DR. In this paper, we theoretically derive a bias-free and state/environment-dependent optimal baseline for DR, and analytically show its ability to achieve further variance reduction over the standard constant and state-dependent baselines for DR. Based on our theory, we further propose a variance reduced domain randomization (VRDR) approach for policy gradient methods, to strike a tradeoff between the variance reduction and computational complexity for the practical implementation. By dividing the entire space of environments into some subspaces and then estimating the state/subspace-dependent baseline, VRDR enjoys a theoretical guarantee of variance reduction and faster convergence than the state-dependent baselines. Empirical evaluations on six robot control tasks with randomized dynamics demonstrate that VRDR not only accelerates the convergence of policy training, but can consistently achieve a better eventual policy with improved training stability.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12050-12067, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37339039

ABSTRACT

This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned confidence maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission. The code will be publicly available at https://github.com/jingjin25/LFhybridSR-Fusion.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 9225-9232, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37018583

ABSTRACT

Batch normalization (BN) is a fundamental unit in modern deep neural networks. However, BN and its variants focus on normalization statistics but neglect the recovery step that uses linear transformation to improve the capacity of fitting complex data distributions. In this paper, we demonstrate that the recovery step can be improved by aggregating the neighborhood of each neuron rather than just considering a single neuron. Specifically, we propose a simple yet effective method named batch normalization with enhanced linear transformation (BNET) to embed spatial contextual information and improve representation ability. BNET can be easily implemented using the depth-wise convolution and seamlessly transplanted into existing architectures with BN. To our best knowledge, BNET is the first attempt to enhance the recovery step for BN. Furthermore, BN is interpreted as a special case of BNET from both spatial and spectral views. Experimental results demonstrate that BNET achieves consistent performance gains based on various backbones in a wide range of visual tasks. Moreover, BNET can accelerate the convergence of network training and enhance spatial information by assigning important neurons with large weights accordingly.

7.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3226-3244, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35503824

ABSTRACT

It is promising to solve linear inverse problems by unfolding iterative algorithms (e.g., iterative shrinkage thresholding algorithm (ISTA)) as deep neural networks (DNNs) with learnable parameters. However, existing ISTA-based unfolded algorithms restrict the network architectures for iterative updates with the partial weight coupling structure to guarantee convergence. In this paper, we propose hybrid ISTA to unfold ISTA with both pre-computed and learned parameters by incorporating free-form DNNs (i.e., DNNs with arbitrary feasible and reasonable network architectures), while ensuring theoretical convergence. We first develop HCISTA to improve the efficiency and flexibility of classical ISTA (with pre-computed parameters) without compromising the convergence rate in theory. Furthermore, the DNN-based hybrid algorithm is generalized to popular variants of learned ISTA, dubbed HLISTA, to enable a free architecture of learned parameters with a guarantee of linear convergence. To our best knowledge, this paper is the first to provide a convergence-provable framework that enables free-form DNNs in ISTA-based unfolded algorithms. This framework is general to endow arbitrary DNNs for solving linear inverse problems with convergence guarantees. Extensive experiments demonstrate that hybrid ISTA can reduce the reconstruction error with an improved convergence rate in the tasks of sparse recovery and compressive sensing.

8.
IEEE J Biomed Health Inform ; 27(1): 29-40, 2023 01.
Article in English | MEDLINE | ID: mdl-35180095

ABSTRACT

Endobronchial ultrasound (EBUS) elastography videos have shown great potential to supplement intrathoracic lymph node diagnosis. However, it is laborious and subjective for the specialists to select the representative frames from the tedious videos and make a diagnosis, and there lacks a framework for automatic representative frame selection and diagnosis. To this end, we propose a novel deep learning framework that achieves reliable diagnosis by explicitly selecting sparse representative frames and guaranteeing the invariance of diagnostic results to the permutations of video frames. Specifically, we develop a differentiable sparse graph attention mechanism that jointly considers frame-level features and the interactions across frames to select sparse representative frames and exclude disturbed frames. Furthermore, instead of adopting deep learning-based frame-level features, we introduce the normalized color histogram that considers the domain knowledge of EBUS elastography images and achieves superior performance. To our best knowledge, the proposed framework is the first to simultaneously achieve automatic representative frame selection and diagnosis with EBUS elastography videos. Experimental results demonstrate that it achieves an average accuracy of 81.29% and area under the receiver operating characteristic curve (AUC) of 0.8749 on the collected dataset of 727 EBUS elastography videos, which is comparable to the performance of the expert-based clinical methods based on manually-selected representative frames.


Subject(s)
Elasticity Imaging Techniques , Humans , Elasticity Imaging Techniques/methods , Thorax , Lymph Nodes/diagnostic imaging , Lymph Nodes/pathology , ROC Curve , Endosonography/methods
9.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3753-3767, 2023 Mar.
Article in English | MEDLINE | ID: mdl-35604978

ABSTRACT

Self-supervised learning based on instance discrimination has shown remarkable progress. In particular, contrastive learning, which regards each image as well as its augmentations as an individual class and tries to distinguish them from all other images, has been verified effective for representation learning. However, conventional contrastive learning does not model the relation between semantically similar samples explicitly. In this paper, we propose a general module that considers the semantic similarity among images. This is achieved by expanding the views generated by a single image to Cross-Samples and Multi-Levels, and modeling the invariance to semantically similar images in a hierarchical way. Specifically, the cross-samples are generated by a data mixing operation, which is constrained within samples that are semantically similar, while the multi-level samples are expanded at the intermediate layers of a network. In this way, the contrastive loss is extended to allow for multiple positives per anchor, and explicitly pulling semantically similar images together at different layers of the network. Our method, termed as CSML, has the ability to integrate multi-level representations across samples in a robust way. CSML is applicable to current contrastive based methods and consistently improves the performance. Notably, using MoCo v2 as an instantiation, CSML achieves 76.6% top-1 accuracy with linear evaluation using ResNet-50 as backbone, 66.7% and 75.1% top-1 accuracy with only 1% and 10% labels, respectively. All these numbers set the new state-of-the-art. The code is available at https://github.com/haohang96/CSML.

10.
IEEE Trans Knowl Data Eng ; 34(2): 996-1010, 2022 Feb.
Article in English | MEDLINE | ID: mdl-36158636

ABSTRACT

The Cox proportional hazards model is a popular semi-parametric model for survival analysis. In this paper, we aim at developing a federated algorithm for the Cox proportional hazards model over vertically partitioned data (i.e., data from the same patient are stored at different institutions). We propose a novel algorithm, namely VERTICOX, to obtain the global model parameters in a distributed fashion based on the Alternating Direction Method of Multipliers (ADMM) framework. The proposed model computes intermediary statistics and exchanges them to calculate the global model without collecting individual patient-level data. We demonstrate that our algorithm achieves equivalent accuracy for the estimation of model parameters and statistics to that of its centralized realization. The proposed algorithm converges linearly under the ADMM framework. Its computational complexity and communication costs are polynomially and linearly associated with the number of subjects, respectively. Experimental results show that VERTICOX can achieve accurate model parameter estimation to support federated survival analysis over vertically distributed data by saving bandwidth and avoiding exchange of information about individual patients. The source code for VERTICOX is available at: https://github.com/daiwenrui/VERTICOX.

11.
Article in English | MEDLINE | ID: mdl-35679381

ABSTRACT

Message passing has evolved as an effective tool for designing graph neural networks (GNNs). However, most existing methods for message passing simply sum or average all the neighboring features to update node representations. They are restricted by two problems: 1) lack of interpretability to identify node features significant to the prediction of GNNs and 2) feature overmixing that leads to the oversmoothing issue in capturing long-range dependencies and inability to handle graphs under heterophily or low homophily. In this article, we propose a node-level capsule graph neural network (NCGNN) to address these problems with an improved message passing scheme. Specifically, NCGNN represents nodes as groups of node-level capsules, in which each capsule extracts distinctive features of its corresponding node. For each node-level capsule, a novel dynamic routing procedure is developed to adaptively select appropriate capsules for aggregation from a subgraph identified by the designed graph filter. NCGNN aggregates only the advantageous capsules and restrains irrelevant messages to avoid overmixing features of interacting nodes. Therefore, it can relieve the oversmoothing issue and learn effective node representations over graphs with homophily or heterophily. Furthermore, our proposed message passing scheme is inherently interpretable and exempt from complex post hoc explanations, as the graph filter and the dynamic routing procedure identify a subset of node features that are most significant to the model prediction from the extracted subgraph. Extensive experiments on synthetic as well as real-world graphs demonstrate that NCGNN can well address the oversmoothing issue and produce better node representations for semisupervised node classification. It outperforms the state of the arts under both homophily and heterophily.

12.
Sci Adv ; 8(24): eabn7630, 2022 Jun 17.
Article in English | MEDLINE | ID: mdl-35704580

ABSTRACT

Photonic neural networks perform brain-inspired computations using photons instead of electrons to achieve substantially improved computing performance. However, existing architectures can only handle data with regular structures but fail to generalize to graph-structured data beyond Euclidean space. Here, we propose the diffractive graph neural network (DGNN), an all-optical graph representation learning architecture based on the diffractive photonic computing units (DPUs) and on-chip optical devices to address this limitation. Specifically, the graph node attributes are encoded into strip optical waveguides, transformed by DPUs, and aggregated by optical couplers to extract their feature representations. DGNN captures complex dependencies among node neighborhoods during the light-speed optical message passing over graph structures. We demonstrate the applications of DGNN for node and graph-level classification tasks with benchmark databases and achieve superior performance. Our work opens up a new direction for designing application-specific integrated photonic circuits for high-efficiency processing large-scale graph data structures using deep learning.

13.
IEEE Trans Neural Netw Learn Syst ; 33(9): 5032-5044, 2022 Sep.
Article in English | MEDLINE | ID: mdl-33788695

ABSTRACT

With the advent of data science, the analysis of network or graph data has become a very timely research problem. A variety of recent works have been proposed to generalize neural networks to graphs, either from a spectral graph theory or a spatial perspective. The majority of these works, however, focus on adapting the convolution operator to graph representation. At the same time, the pooling operator also plays an important role in distilling multiscale and hierarchical representations, but it has been mostly overlooked so far. In this article, we propose a parameter-free pooling operator, called iPool, that permits to retain the most informative features in arbitrary graphs. With the argument that informative nodes dominantly characterize graph signals, we propose a criterion to evaluate the amount of information of each node given its neighbors and theoretically demonstrate its relationship to neighborhood conditional entropy. This new criterion determines how nodes are selected and coarsened graphs are constructed in the pooling layer. The resulting hierarchical structure yields an effective isomorphism-invariant representation of networked data on arbitrary topologies. The proposed strategy achieves superior or competitive performance in graph classification on a collection of public graph benchmark data sets and superpixel-induced image graph data sets.

14.
IEEE Trans Neural Netw Learn Syst ; 33(10): 5253-5267, 2022 10.
Article in English | MEDLINE | ID: mdl-33830929

ABSTRACT

Model quantization is essential to deploy deep convolutional neural networks (DCNNs) on resource-constrained devices. In this article, we propose a general bitwidth assignment algorithm based on theoretical analysis for efficient layerwise weight and activation quantization of DCNNs. The proposed algorithm develops a prediction model to explicitly estimate the loss of classification accuracy led by weight quantization with a geometrical approach. Consequently, dynamic programming is adopted to achieve optimal bitwidth assignment on weights based on the estimated error. Furthermore, we optimize bitwidth assignment for activations by considering the signal-to-quantization-noise ratio (SQNR) between weight and activation quantization. The proposed algorithm is general to reveal the tradeoff between classification accuracy and model size for various network architectures. Extensive experiments demonstrate the efficacy of the proposed bitwidth assignment algorithm and the error rate prediction model. Furthermore, the proposed algorithm is shown to be well extended to object detection.


Subject(s)
Algorithms , Neural Networks, Computer
15.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8694-8700, 2022 11.
Article in English | MEDLINE | ID: mdl-34018928

ABSTRACT

In this paper, we propose the K-Shot Contrastive Learning (KSCL) of visual features by applying multiple augmentations to investigate the sample variations within individual instances. It aims to combine the advantages of inter-instance discrimination by learning discriminative features to distinguish between different instances, as well as intra-instance variations by matching queries against the variants of augmented samples over instances. Particularly, for each instance, it constructs an instance subspace to model the configuration of how the significant factors of variations in K-shot augmentations can be combined to form the variants of augmentations. Given a query, the most relevant variant of instances is then retrieved by projecting the query onto their subspaces to predict the positive instance class. This generalizes the existing contrastive learning that can be viewed as a special one-shot case. An eigenvalue decomposition is performed to configure instance subspaces, and the embedding network can be trained end-to-end through the differentiable subspace configuration. Experiment results demonstrate the proposed K-shot contrastive learning achieves superior performances to the state-of-the-art unsupervised methods.


Subject(s)
Algorithms , Learning
16.
Front Oncol ; 11: 673775, 2021.
Article in English | MEDLINE | ID: mdl-34136402

ABSTRACT

BACKGROUND: Endoscopic ultrasound (EBUS) strain elastography can diagnose intrathoracic benign and malignant lymph nodes (LNs) by reflecting the relative stiffness of tissues. Due to strong subjectivity, it is difficult to give full play to the diagnostic efficiency of strain elastography. This study aims to use machine learning to automatically select high-quality and stable representative images from EBUS strain elastography videos. METHODS: LNs with qualified strain elastography videos from June 2019 to November 2019 were enrolled in the training and validation sets randomly at a quantity ratio of 3:1 to train an automatic image selection model using machine learning algorithm. The strain elastography videos in December 2019 were used as the test set, from which three representative images were selected for each LN by the model. Meanwhile, three experts and three trainees selected one representative image severally for each LN on the test set. Qualitative grading score and four quantitative methods were used to evaluate images above to assess the performance of the automatic image selection model. RESULTS: A total of 415 LNs were included in the training and validation sets and 91 LNs in the test set. Result of the qualitative grading score showed that there was no statistical difference between the three images selected by the machine learning model. Coefficient of variation (CV) values of the four quantitative methods in the machine learning group were all lower than the corresponding CV values in the expert and trainee groups, which demonstrated great stability of the machine learning model. Diagnostic performance analysis on the four quantitative methods showed that the diagnostic accuracies were range from 70.33% to 73.63% in the trainee group, 78.02% to 83.52% in the machine learning group, and 80.22% to 82.42% in the expert group. Moreover, there were no statistical differences in corresponding mean values of the four quantitative methods between the machine learning and expert groups (p >0.05). CONCLUSION: The automatic image selection model established in this study can help select stable and high-quality representative images from EBUS strain elastography videos, which has great potential in the diagnosis of intrathoracic LNs.

17.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 2953-2970, 2021 09.
Article in English | MEDLINE | ID: mdl-33591909

ABSTRACT

Differentiable architecture search (DARTS) enables effective neural architecture search (NAS) using gradient descent, but suffers from high memory and computational costs. In this paper, we propose a novel approach, namely Partially-Connected DARTS (PC-DARTS), to achieve efficient and stable neural architecture search by reducing the channel and spatial redundancies of the super-network. In the channel level, partial channel connection is presented to randomly sample a small subset of channels for operation selection to accelerate the search process and suppress the over-fitting of the super-network. Side operation is introduced for bypassing (non-sampled) channels to guarantee the performance of searched architectures under extremely low sampling rates. In the spatial level, input features are down-sampled to eliminate spatial redundancy and enhance the efficiency of the mixed computation for operation selection. Furthermore, edge normalization is developed to maintain the consistency of edge selection based on channel sampling with the architectural parameters for edges. Theoretical analysis shows that partial channel connection and parameterized side operation are equivalent to regularizing the super-network on the weights and architectural parameters during bilevel optimization. Experimental results demonstrate that the proposed approach achieves higher search speed and training stability than DARTS. PC-DARTS obtains a top-1 error rate of 2.55 percent on CIFAR-10 with 0.07 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.1 percent on ImageNet (under the mobile setting) within 2.8 GPU-days.

18.
Endosc Ultrasound ; 10(5): 361-371, 2021.
Article in English | MEDLINE | ID: mdl-33565422

ABSTRACT

BACKGROUND AND OBJECTIVES: Along with the rapid improvement of imaging technology, convex probe endobronchial ultrasound (CP-EBUS) sonographic features play an increasingly important role in the diagnosis of intrathoracic lymph nodes (LNs). Conventional qualitative and quantitative methods for EBUS multimodal imaging are time-consuming and rely heavily on the experience of endoscopists. With the development of deep-learning (DL) models, there is great promise in the diagnostic field of medical imaging. MATERIALS AND METHODS: We developed DL models to retrospectively analyze CP-EBUS images of 294 LNs from 267 patients collected between July 2018 and May 2019. The DL models were trained on 245 LNs to differentiate benign and malignant LNs using both unimodal and multimodal CP-EBUS images and independently evaluated on the remaining 49 LNs to validate their diagnostic efficiency. The human comparator group consisting of three experts and three trainees reviewed the same test set as the DL models. RESULTS: The multimodal DL framework achieves an accuracy of 88.57% (95% confidence interval [CI] [86.91%-90.24%]) and area under the curve (AUC) of 0.9547 (95% CI [0.9451-0.9643]) using the three modes of CP-EBUS imaging in comparison to the accuracy of 80.82% (95% CI [77.42%-84.21%]) and AUC of 0.8696 (95% CI [0.8369-0.9023]) by experts. Statistical comparison of their average receiver operating curves shows a statistically significant difference (P < 0.001). Moreover, the multimodal DL framework is more consistent than experts (kappa values 0.7605 vs. 0.5800). CONCLUSIONS: The DL models based on CP-EBUS imaging demonstrated an accurate automated tool for diagnosis of the intrathoracic LNs with higher diagnostic efficiency and consistency compared with experts.

19.
IEEE Trans Cybern ; 51(3): 1478-1492, 2021 Mar.
Article in English | MEDLINE | ID: mdl-31199281

ABSTRACT

The task of reidentifying groups of people under different camera views is an important yet less-studied problem. Group reidentification (Re-ID) is a very challenging task since it is not only adversely affected by common issues in traditional single-object Re-ID problems, such as viewpoint and human pose variations, but also suffers from changes in group layout and group membership. In this paper, we propose a novel concept of group granularity by characterizing a group image by multigrained objects: individual people and subgroups of two and three people within a group. To achieve robust group Re-ID, we first introduce multigrained representations which can be extracted via the development of two separate schemes, that is, one with handcrafted descriptors and another with deep neural networks. The proposed representation seeks to characterize both appearance and spatial relations of multigrained objects, and is further equipped with importance weights which capture variations in intragroup dynamics. Optimal group-wise matching is facilitated by a multiorder matching process which, in turn, dynamically updates the importance weights in iterative fashion. We evaluated three multicamera group datasets containing complex scenarios and large dynamics, with experimental results demonstrating the effectiveness of our approach.

20.
Article in English | MEDLINE | ID: mdl-31670666

ABSTRACT

This paper introduces a new model for Weakly Supervised Object Localization (WSOL) problems where only image-level supervision is provided. The key to solve such problems is to infer the object locations accurately. Previous methods usually model the missing object locations as latent variables, and alternate between updating their estimates and learning a detector accordingly. However, the performance of such alternative optimization is sensitive to the quality of the initial latent variables and the resulted localization model is prone to overfitting to improper localizations. To address these issues, we develop a Prior-induced Multi-view Learning Localization Network (PML-LocNet) which exploits both view diversity and sample diversity to improve object localization. In particular, the view diversity is imposed by a two-phase multi-view learning strategy, with which the complementarity among learned features from different views and the consensus among localized instances from each view are leveraged to benefit localization. The sample diversity is pursued by harnessing coarse-to-fine priors at both image and instance levels. With these priors, more emphasis would go to the reliable samples and the contributions of the unreliable ones would be decreased, such that the intrinsic characteristics of each sample can be exploited to make the model more robust during network learning. PML-LocNet can be easily combined with existing WSOL models to further improve the localization accuracy. Its effectiveness has been proved experimentally. Notably, it achieves 69.3% CorLoc and 50.4% mAP on PASCAL VOC 2007, surpassing the state-of-the-arts by a large margin.

SELECTION OF CITATIONS
SEARCH DETAIL