Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
Article in English | MEDLINE | ID: mdl-38578851

ABSTRACT

A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers. Some methods use polynomial reduction strategies to simplify the system, but this adds some phantom solutions. In any event, an additional mechanism is employed to pick the best solution, which adds to the computation without any guarantees on the reliability of the solution. Second, these methods formulate constraints between a pair of images. Even if there is enough motion between them, they may suffer from local degeneracies that make the resulting estimates unreliable without any warning mechanism. %Unfortunately, these systems are of high degree with up to five real solutions. Hence, a computationally expensive strategy is required to select a unique solution. Furthermore, they suffer from degeneracies that make the resulting estimates unreliable, without any mechanism to identify this situation. In this paper, we solve these problems for isometric/conformal NRSfM. We show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster. %In this paper, we show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(6): 4489-4503, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38231797

ABSTRACT

In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches have shown remarkable success on synthetic datasets, we have observed them to fail in the presence of real-world data. We investigate the root causes of these failures and identify two main challenges: The sensitivity of the widely-used SVD-based loss function to the range of rotation between the two point clouds, and the difference in feature distributions between the source and target point clouds. We address the first challenge by introducing a directly supervised loss function that does not utilize the SVD operation. To tackle the second, we introduce a new normalization strategy, Match Normalization. Our two contributions are general and can be applied to many existing learning-based 3D object registration frameworks, which we illustrate by implementing them in two of them, DCP and IDAM. Our experiments on the real-scene TUD-L Hodan et al. 2018, LINEMOD Hinterstoisser et al. 2012 and Occluded-LINEMOD Brachmann et al. 2014 datasets evidence the benefits of our strategies. They allow for the first-time learning-based 3D object registration methods to achieve meaningful results on real-world data. We therefore expect them to be key to the future developments of point cloud registration methods.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2450-2460, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38019625

ABSTRACT

Vehicles can encounter a myriad of obstacles on the road, and it is impossible to record them all beforehand to train a detector. Instead, we select image patches and inpaint them with the surrounding road texture, which tends to remove obstacles from those patches. We then use a network trained to recognize discrepancies between the original patch and the inpainted one, which signals an erased obstacle.

4.
IEEE Trans Image Process ; 32: 6102-6114, 2023.
Article in English | MEDLINE | ID: mdl-37883291

ABSTRACT

While adversarial training and its variants have shown to be the most effective algorithms to defend against adversarial attacks, their extremely slow training process makes it hard to scale to large datasets like ImageNet. The key idea of recent works to accelerate adversarial training is to substitute multi-step attacks (e.g., PGD) with single-step attacks (e.g., FGSM). However, these single-step methods suffer from catastrophic overfitting, where the accuracy against PGD attack suddenly drops to nearly 0% during training, and the network totally loses its robustness. In this work, we study the phenomenon from the perspective of training instances. We show that catastrophic overfitting is instance-dependent, and fitting instances with larger input gradient norm is more likely to cause catastrophic overfitting. Based on our findings, we propose a simple but effective method, Adversarial Training with Adaptive Step size (ATAS). ATAS learns an instance-wise adaptive step size that is inversely proportional to its gradient norm. Our theoretical analysis shows that ATAS converges faster than the commonly adopted non-adaptive counterparts. Empirically, ATAS consistently mitigates catastrophic overfitting and achieves higher robust accuracy on CIFAR10, CIFAR100, and ImageNet when evaluated on various adversarial budgets. Our code is released at https://github.com/HuangZhiChao95/ATAS.

5.
IEEE Trans Neural Netw Learn Syst ; 34(6): 3146-3160, 2023 Jun.
Article in English | MEDLINE | ID: mdl-34699369

ABSTRACT

Training certifiable neural networks enables us to obtain models with robustness guarantees against adversarial attacks. In this work, we introduce a framework to obtain a provable adversarial-free region in the neighborhood of the input data by a polyhedral envelope, which yields more fine-grained certified robustness than existing methods. We further introduce polyhedral envelope regularization (PER) to encourage larger adversarial-free regions and thus improve the provable robustness of the models. We demonstrate the flexibility and effectiveness of our framework on standard benchmarks; it applies to networks of different architectures and with general activation functions. Compared with state of the art, PER has negligible computational overhead; it achieves better robustness guarantees and accuracy on the clean data in various settings.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6415-6427, 2023 05.
Article in English | MEDLINE | ID: mdl-36251908

ABSTRACT

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.


Subject(s)
Algorithms , Learning , Humans , Videotape Recording
7.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 181-195, 2022 01.
Article in English | MEDLINE | ID: mdl-32750825

ABSTRACT

In this paper, we tackle the problem of static 3D cloth draping on virtual human bodies. We introduce a two-stream deep network model that produces a visually plausible draping of a template cloth on virtual 3D bodies by extracting features from both the body and garment shapes. Our network learns to mimic a physics-based simulation (PBS) method while requiring two orders of magnitude less computation time. To train the network, we introduce loss terms inspired by PBS to produce plausible results and make the model collision-aware. To increase the details of the draped garment, we introduce two loss functions that penalize the difference between the curvature of the predicted cloth and PBS. Particularly, we study the impact of mean curvature normal and a novel detail-preserving loss both qualitatively and quantitatively. Our new curvature loss computes the local covariance matrices of the 3D points, and compares the Rayleigh quotients of the prediction and PBS. This leads to more details while performing favorably or comparably against the loss that considers mean curvature normal vectors in the 3D triangulated meshes. We validate our framework on four garment types for various body shapes and poses. Finally, we achieve superior performance against a recently proposed data-driven method.


Subject(s)
Algorithms , Computer Simulation , Humans
8.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5472-5487, 2022 Sep.
Article in English | MEDLINE | ID: mdl-33844626

ABSTRACT

Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8110-8124, 2022 11.
Article in English | MEDLINE | ID: mdl-34460367

ABSTRACT

Weight sharing promises to make neural architecture search (NAS) tractable even on commodity hardware. Existing methods in this space rely on a diverse set of heuristics to design and train the shared-weight backbone network, a.k.a. the super-net. Since heuristics substantially vary across different methods and have not been carefully studied, it is unclear to which extent they impact super-net training and hence the weight-sharing NAS algorithms. In this paper, we disentangle super-net training from the search algorithm, isolate 14 frequently-used training heuristics, and evaluate them over three benchmark search spaces. Our analysis uncovers that several commonly-used heuristics negatively impact the correlation between super-net and stand-alone performance, whereas simple, but often overlooked factors, such as proper hyper-parameter settings, are key to achieve strong performance. Equipped with this knowledge, we show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.


Subject(s)
Algorithms , Heuristics , Benchmarking , Plant Extracts
10.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8151-8166, 2022 11.
Article in English | MEDLINE | ID: mdl-34351854

ABSTRACT

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.


Subject(s)
Algorithms , Crowding , Humans
11.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9574-9588, 2022 12.
Article in English | MEDLINE | ID: mdl-34714741

ABSTRACT

While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. At the heart of our approach lies the observation that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the moving object cannot. We encode this intuition into a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.


Subject(s)
Image Processing, Computer-Assisted , Supervised Machine Learning , Humans , Image Processing, Computer-Assisted/methods , Algorithms
12.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3167-3182, 2021 Sep.
Article in English | MEDLINE | ID: mdl-32149625

ABSTRACT

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.

13.
Med Image Anal ; 60: 101590, 2020 02.
Article in English | MEDLINE | ID: mdl-31841949

ABSTRACT

The difficulty of obtaining annotations to build training databases still slows down the adoption of recent deep learning approaches for biomedical image analysis. In this paper, we show that we can train a Deep Net to perform 3D volumetric delineation given only 2D annotations in Maximum Intensity Projections (MIP) of the training volumes. This significantly reduces the annotation time: We conducted a user study that suggests that annotating 2D projections is on average twice as fast as annotating the original 3D volumes. Our technical contribution is a loss function that evaluates a 3D prediction against annotations of 2D projections. It is inspired by space carving, a classical approach to reconstructing complex 3D shapes from arbitrarily-positioned cameras. It can be used to train any deep network with volumetric output, without the need to change the network's architecture. Substituting the loss is all it takes to enable 2D annotations in an existing training setup. In extensive experiments on 3D light microscopy images of neurons and retinal blood vessels, and on Magnetic Resonance Angiography (MRA) brain scans, we show that, when trained on projection annotations, deep delineation networks perform as well as when they are trained using costlier 3D annotations.


Subject(s)
Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Magnetic Resonance Angiography , Neural Networks, Computer , Brain/blood supply , Brain/diagnostic imaging , Datasets as Topic , Deep Learning , Humans , Retinal Vessels/diagnostic imaging
14.
IEEE Trans Med Imaging ; 39(4): 1256-1267, 2020 04.
Article in English | MEDLINE | ID: mdl-31603817

ABSTRACT

We present an Unsupervised Domain Adaptation strategy to compensate for domain shifts on Electron Microscopy volumes. Our method aggregates visual correspondences-motifs that are visually similar across different acquisitions-to infer changes on the parameters of pretrained models, and enable them to operate on new data. In particular, we examine the annotations of an existing acquisition to determine pivot locations that characterize the reference segmentation, and use a patch matching algorithm to find their candidate visual correspondences in a new volume. We aggregate all the candidate correspondences by a voting scheme and we use them to construct a consensus heatmap: a map of how frequently locations on the new volume are matched to relevant locations from the original acquisition. This information allows us to perform model adaptations in two different ways: either by a) optimizing model parameters under a Multiple Instance Learning formulation, so that predictions between reference locations and their sets of correspondences agree, or by b) using high-scoring regions of the heatmap as soft labels to be incorporated in other domain adaptation pipelines, including deep learning ones. We show that these unsupervised techniques allow us to obtain high-quality segmentations on unannotated volumes, qualitatively consistent with results obtained under full supervision, for both mitochondria and synapses, with no need for new annotation effort.


Subject(s)
Image Processing, Computer-Assisted/methods , Microscopy, Electron/methods , Unsupervised Machine Learning , Algorithms , Animals , Brain/cytology , Brain/diagnostic imaging , Brain/ultrastructure , Mice , Mitochondria/ultrastructure
15.
IEEE Trans Pattern Anal Mach Intell ; 41(4): 886-900, 2019 Apr.
Article in English | MEDLINE | ID: mdl-29993772

ABSTRACT

Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable Xi is represented by l nodes (where l is the number of labels) arranged in a column. However, this method in general requires 2 l2 edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce a variant of the max-flow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer.

16.
IEEE Trans Pattern Anal Mach Intell ; 41(4): 801-814, 2019 04.
Article in English | MEDLINE | ID: mdl-29994060

ABSTRACT

The performance of a classifier trained on data coming from a specific domain typically degrades when applied to a related but different one. While annotating many samples from the new domain would address this issue, it is often too expensive or impractical. Domain Adaptation has therefore emerged as a solution to this problem; It leverages annotated data from a source domain, in which it is abundant, to train a classifier to operate in a target domain, in which it is either sparse or even lacking altogether. In this context, the recent trend consists of learning deep architectures whose weights are shared for both domains, which essentially amounts to learning domain invariant features. Here, we show that it is more effective to explicitly model the shift from one domain to the other. To this end, we introduce a two-stream architecture, where one operates in the source domain and the other in the target domain. In contrast to other approaches, the weights in corresponding layers are related but not shared. We demonstrate that this both yields higher accuracy than state-of-the-art methods on several object recognition and detection tasks and consistently outperforms networks with shared weights in both supervised and unsupervised settings.

17.
IEEE Trans Pattern Anal Mach Intell ; 40(1): 48-62, 2018 01.
Article in English | MEDLINE | ID: mdl-28103548

ABSTRACT

Representing images and videos with Symmetric Positive Definite (SPD) matrices, and considering the Riemannian geometry of the resulting space, has been shown to yield high discriminative power in many visual recognition tasks. Unfortunately, computation on the Riemannian manifold of SPD matrices -especially of high-dimensional ones- comes at a high cost that limits the applicability of existing techniques. In this paper, we introduce algorithms able to handle high-dimensional SPD matrices by constructing a lower-dimensional SPD manifold. To this end, we propose to model the mapping from the high-dimensional SPD manifold to the low-dimensional one with an orthonormal projection. This lets us formulate dimensionality reduction as the problem of finding a projection that yields a low-dimensional manifold either with maximum discriminative power in the supervised scenario, or with maximum variance of the data in the unsupervised one. We show that learning can be expressed as an optimization problem on a Grassmann manifold and discuss fast solutions for special cases. Our evaluation on several classification tasks evidences that our approach leads to a significant accuracy gain over state-of-the-art methods.

18.
IEEE Trans Pattern Anal Mach Intell ; 40(6): 1382-1396, 2018 06.
Article in English | MEDLINE | ID: mdl-28613162

ABSTRACT

Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.

19.
IEEE Trans Pattern Anal Mach Intell ; 37(12): 2464-77, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26539851

ABSTRACT

In this paper, we develop an approach to exploiting kernel methods with manifold-valued data. In many computer vision problems, the data can be naturally represented as points on a Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, usual Euclidean computer vision and machine learning algorithms yield inferior results on such data. In this paper, we define Gaussian radial basis function (RBF)-based positive definite kernels on manifolds that permit us to embed a given manifold with a corresponding metric in a high dimensional reproducing kernel Hilbert space. These kernels make it possible to utilize algorithms developed for linear spaces on nonlinear manifold-valued data. Since the Gaussian RBF defined with any given metric is not always positive definite, we present a unified framework for analyzing the positive definiteness of the Gaussian RBF on a generic metric space. We then use the proposed framework to identify positive definite kernels on two specific manifolds commonly encountered in computer vision: the Riemannian manifold of symmetric positive definite matrices and the Grassmann manifold, i.e., the Riemannian manifold of linear subspaces of a Euclidean space. We show that many popular algorithms designed for Euclidean spaces, such as support vector machines, discriminant analysis and principal component analysis can be generalized to Riemannian manifolds with the help of such positive definite Gaussian kernels.

20.
IEEE Trans Pattern Anal Mach Intell ; 37(4): 760-73, 2015 Apr.
Article in English | MEDLINE | ID: mdl-26353292

ABSTRACT

This paper tackles the problem of reconstructing the shape of a smooth mirror surface from a single image. In particular, we consider the case where the camera is observing the reflection of a static reference target in the unknown mirror. We first study the reconstruction problem given dense correspondences between 3D points on the reference target and image locations. In such conditions, our differential geometry analysis provides a theoretical proof that the shape of the mirror surface can be recovered if the pose of the reference target is known. We then relax our assumptions by considering the case where only sparse correspondences are available. In this scenario, we formulate reconstruction as an optimization problem, which can be solved using a nonlinear least-squares method. We demonstrate the effectiveness of our method on both synthetic and real images. We then provide a theoretical analysis of the potential degenerate cases with and without prior knowledge of the pose of the reference target. Finally we show that our theory can be similarly applied to the reconstruction of the surface of transparent object.

SELECTION OF CITATIONS
SEARCH DETAIL
...