Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38578851

RESUMO

A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers. Some methods use polynomial reduction strategies to simplify the system, but this adds some phantom solutions. In any event, an additional mechanism is employed to pick the best solution, which adds to the computation without any guarantees on the reliability of the solution. Second, these methods formulate constraints between a pair of images. Even if there is enough motion between them, they may suffer from local degeneracies that make the resulting estimates unreliable without any warning mechanism. %Unfortunately, these systems are of high degree with up to five real solutions. Hence, a computationally expensive strategy is required to select a unique solution. Furthermore, they suffer from degeneracies that make the resulting estimates unreliable, without any mechanism to identify this situation. In this paper, we solve these problems for isometric/conformal NRSfM. We show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster. %In this paper, we show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38648137

RESUMO

Geometric Deep Learning has recently made striking progress with the advent of continuous deep implicit fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field. In this work, we remove this limitation and introduce a differentiable way to produce explicit surface mesh representations from Deep Implicit Fields. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying deep implicit field. We exploit this to define DeepMesh - an end-to-end differentiable mesh representation that can vary its topology. We validate our theoretical insight through several applications: Single view 3D Reconstruction via Differentiable Rendering, Physically-Driven Shape Optimization, Full Scene 3D Reconstruction from Scans and End-to-End Training. In all cases our end-to-end differentiable parameterization gives us an edge over state-of-the-art algorithms.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 2450-2460, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38019625

RESUMO

Vehicles can encounter a myriad of obstacles on the road, and it is impossible to record them all beforehand to train a detector. Instead, we select image patches and inpaint them with the surrounding road texture, which tends to remove obstacles from those patches. We then use a network trained to recognize discrepancies between the original patch and the inpainted one, which signals an erased obstacle.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10588-10595, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37028072

RESUMO

Persistent Homology (PH) has been successfully used to train networks to detect curvilinear structures and to improve the topological quality of their results. However, existing methods are very global and ignore the location of topological features. In this paper, we remedy this by introducing a new filtration function that fuses two earlier approaches: thresholding-based filtration, previously used to train deep networks to segment medical images, and filtration with height functions, typically used to compare 2D and 3D shapes. We experimentally demonstrate that deep networks trained using our PH-based loss function yield reconstructions of road networks and neuronal processes that reflect ground-truth connectivity better than networks trained with existing loss functions based on PH.

5.
Nat Methods ; 20(6): 824-835, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37069271

RESUMO

BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.


Assuntos
Benchmarking , Microscopia , Microscopia/métodos , Imageamento Tridimensional/métodos , Neurônios/fisiologia , Algoritmos
6.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6415-6427, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36251908

RESUMO

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.


Assuntos
Algoritmos , Aprendizagem , Humanos , Gravação de Videoteipe
7.
IEEE Trans Med Imaging ; 41(12): 3675-3685, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35862340

RESUMO

Deep learning-based approaches to delineating 3D structure depend on accurate annotations to train the networks. Yet in practice, people, no matter how conscientious, have trouble precisely delineating in 3D and on a large scale, in part because the data is often hard to interpret visually and in part because the 3D interfaces are awkward to use. In this paper, we introduce a method that explicitly accounts for annotation inaccuracies. To this end, we treat the annotations as active contour models that can deform themselves while preserving their topology. This enables us to jointly train the network and correct potential errors in the original annotations. The result is an approach that boosts performance of deep networks trained with potentially inaccurate annotations.

8.
Ultramicroscopy ; 234: 113460, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35121280

RESUMO

Curvilinear structures frequently appear in microscopy imaging as the object of interest. Crystallographic defects, i.e dislocations, are one of the curvilinear structures that have been repeatedly investigated under transmission electron microscopy (TEM) and their 3D structural information is of great importance for understanding the properties of materials. 3D information of dislocations is often obtained by tomography which is a cumbersome process since it is required to acquire many images with different tilt angles and similar imaging conditions. Although, alternative stereoscopy methods lower the number of required images to two, they still require human intervention and shape priors for accurate 3D estimation. We propose a fully automated pipeline for both detection and matching of curvilinear structures in stereo pairs by utilizing deep convolutional neural networks (CNNs) without making any prior assumption on 3D shapes. In this work, we mainly focus on 3D reconstruction of dislocations from stereo pairs of TEM images.

9.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 181-195, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-32750825

RESUMO

In this paper, we tackle the problem of static 3D cloth draping on virtual human bodies. We introduce a two-stream deep network model that produces a visually plausible draping of a template cloth on virtual 3D bodies by extracting features from both the body and garment shapes. Our network learns to mimic a physics-based simulation (PBS) method while requiring two orders of magnitude less computation time. To train the network, we introduce loss terms inspired by PBS to produce plausible results and make the model collision-aware. To increase the details of the draped garment, we introduce two loss functions that penalize the difference between the curvature of the predicted cloth and PBS. Particularly, we study the impact of mean curvature normal and a novel detail-preserving loss both qualitatively and quantitatively. Our new curvature loss computes the local covariance matrices of the 3D points, and compares the Rayleigh quotients of the prediction and PBS. This leads to more details while performing favorably or comparably against the loss that considers mean curvature normal vectors in the 3D triangulated meshes. We validate our framework on four garment types for various body shapes and poses. Finally, we achieve superior performance against a recently proposed data-driven method.


Assuntos
Algoritmos , Simulação por Computador , Humanos
10.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5472-5487, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33844626

RESUMO

Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

11.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5401-5413, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33881988

RESUMO

We propose a novel, connectivity-oriented loss function for training deep convolutional networks to reconstruct network-like structures, like roads and irrigation canals, from aerial images. The main idea behind our loss is to express the connectivity of roads, or canals, in terms of disconnections that they create between background regions of the image. In simple terms, a gap in the predicted road causes two background regions, that lie on the opposite sides of a ground truth road, to touch in prediction. Our loss function is designed to prevent such unwanted connections between background regions, and therefore close the gaps in predicted roads. It also prevents predicting false positive roads and canals by penalizing unwarranted disconnections of background regions. In order to capture even short, dead-ending road segments, we evaluate the loss in small image crops. We show, in experiments on two standard road benchmarks and a new data set of irrigation canals, that convnets trained with our loss function recover road connectivity so well that it suffices to skeletonize their output to produce state of the art maps. A distinct advantage of our approach is that the loss can be plugged in to any existing training setup without further modifications.

12.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8151-8166, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34351854

RESUMO

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.


Assuntos
Algoritmos , Aglomeração , Humanos
13.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9574-9588, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34714741

RESUMO

While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. At the heart of our approach lies the observation that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the moving object cannot. We encode this intuition into a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizado de Máquina Supervisionado , Humanos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
14.
Nat Methods ; 18(8): 975-981, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34354294

RESUMO

Markerless three-dimensional (3D) pose estimation has become an indispensable tool for kinematic studies of laboratory animals. Most current methods recover 3D poses by multi-view triangulation of deep network-based two-dimensional (2D) pose estimates. However, triangulation requires multiple synchronized cameras and elaborate calibration protocols that hinder its widespread adoption in laboratory studies. Here we describe LiftPose3D, a deep network-based method that overcomes these barriers by reconstructing 3D poses from a single 2D camera view. We illustrate LiftPose3D's versatility by applying it to multiple experimental systems using flies, mice, rats and macaques, and in circumstances where 3D triangulation is impractical or impossible. Our framework achieves accurate lifting for stereotypical and nonstereotypical behaviors from different camera angles. Thus, LiftPose3D permits high-quality 3D pose estimation in the absence of complex camera arrays and tedious calibration procedures and despite occluded body parts in freely behaving animals.


Assuntos
Algoritmos , Animais de Laboratório/fisiologia , Aprendizado Profundo , Imageamento Tridimensional/métodos , Postura/fisiologia , Animais , Calibragem , Drosophila melanogaster , Feminino , Macaca , Camundongos , Ratos
15.
IEEE Trans Pattern Anal Mach Intell ; 43(2): 745-752, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-31425018

RESUMO

In this paper, we propose a novel unsupervised approach for sequence matching by explicitly accounting for the locality properties in the sequences. In contrast to conventional approaches that rely on frame-to-frame matching, we conduct matching using sequencelet or seqlet, a sub-sequence wherein the frames share strong similarities and are thus grouped together. The optimal seqlets and matching between them are learned jointly, without any supervision from users. The learned seqlets preserve the locality information at the scale of interest and resolve the ambiguities during matching, which are omitted by frame-based matching methods. We show that our proposed approach outperforms the state-of-the-art ones on datasets of different domains including human actions, facial expressions, speech, and character strokes.

16.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3167-3182, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32149625

RESUMO

Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.

17.
IEEE Trans Pattern Anal Mach Intell ; 42(6): 1515-1521, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31180837

RESUMO

Detection of curvilinear structures in images has long been of interest. One of the most challenging aspects of this problem is inferring the graph representation of the curvilinear network. Most existing delineation approaches first perform binary segmentation of the image and then refine it using either a set of hand-designed heuristics or a separate classifier that assigns likelihood to paths extracted from the pixel-wise prediction. In our work, we bridge the gap between segmentation and path classification by training a deep network that performs those two tasks simultaneously. We show that this approach is beneficial because it enforces consistency across the whole processing pipeline. We apply our approach on roads and neurons datasets.

18.
Med Image Anal ; 60: 101590, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31841949

RESUMO

The difficulty of obtaining annotations to build training databases still slows down the adoption of recent deep learning approaches for biomedical image analysis. In this paper, we show that we can train a Deep Net to perform 3D volumetric delineation given only 2D annotations in Maximum Intensity Projections (MIP) of the training volumes. This significantly reduces the annotation time: We conducted a user study that suggests that annotating 2D projections is on average twice as fast as annotating the original 3D volumes. Our technical contribution is a loss function that evaluates a 3D prediction against annotations of 2D projections. It is inspired by space carving, a classical approach to reconstructing complex 3D shapes from arbitrarily-positioned cameras. It can be used to train any deep network with volumetric output, without the need to change the network's architecture. Substituting the loss is all it takes to enable 2D annotations in an existing training setup. In extensive experiments on 3D light microscopy images of neurons and retinal blood vessels, and on Magnetic Resonance Angiography (MRA) brain scans, we show that, when trained on projection annotations, deep delineation networks perform as well as when they are trained using costlier 3D annotations.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Angiografia por Ressonância Magnética , Redes Neurais de Computação , Encéfalo/irrigação sanguínea , Encéfalo/diagnóstico por imagem , Conjuntos de Dados como Assunto , Aprendizado Profundo , Humanos , Vasos Retinianos/diagnóstico por imagem
19.
IEEE Trans Med Imaging ; 39(4): 1256-1267, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31603817

RESUMO

We present an Unsupervised Domain Adaptation strategy to compensate for domain shifts on Electron Microscopy volumes. Our method aggregates visual correspondences-motifs that are visually similar across different acquisitions-to infer changes on the parameters of pretrained models, and enable them to operate on new data. In particular, we examine the annotations of an existing acquisition to determine pivot locations that characterize the reference segmentation, and use a patch matching algorithm to find their candidate visual correspondences in a new volume. We aggregate all the candidate correspondences by a voting scheme and we use them to construct a consensus heatmap: a map of how frequently locations on the new volume are matched to relevant locations from the original acquisition. This information allows us to perform model adaptations in two different ways: either by a) optimizing model parameters under a Multiple Instance Learning formulation, so that predictions between reference locations and their sets of correspondences agree, or by b) using high-scoring regions of the heatmap as soft labels to be incorporated in other domain adaptation pipelines, including deep learning ones. We show that these unsupervised techniques allow us to obtain high-quality segmentations on unannotated volumes, qualitatively consistent with results obtained under full supervision, for both mitochondria and synapses, with no need for new annotation effort.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Microscopia Eletrônica/métodos , Aprendizado de Máquina não Supervisionado , Algoritmos , Animais , Encéfalo/citologia , Encéfalo/diagnóstico por imagem , Encéfalo/ultraestrutura , Camundongos , Mitocôndrias/ultraestrutura
20.
Elife ; 82019 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-31584428

RESUMO

Studying how neural circuits orchestrate limbed behaviors requires the precise measurement of the positions of each appendage in three-dimensional (3D) space. Deep neural networks can estimate two-dimensional (2D) pose in freely behaving and tethered animals. However, the unique challenges associated with transforming these 2D measurements into reliable and precise 3D poses have not been addressed for small animals including the fly, Drosophila melanogaster. Here, we present DeepFly3D, a software that infers the 3D pose of tethered, adult Drosophila using multiple camera images. DeepFly3D does not require manual calibration, uses pictorial structures to automatically detect and correct pose estimation errors, and uses active learning to iteratively improve performance. We demonstrate more accurate unsupervised behavioral embedding using 3D joint angles rather than commonly used 2D pose data. Thus, DeepFly3D enables the automated acquisition of Drosophila behavioral measurements at an unprecedented level of detail for a variety of biological applications.


Assuntos
Drosophila/fisiologia , Extremidades/fisiologia , Imageamento Tridimensional/métodos , Movimento , Imagem Óptica/métodos , Software , Animais , Comportamento Animal , Aprendizado Profundo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA