Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Assunto principal
Intervalo de ano de publicação
1.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10603-10614, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37195850

RESUMO

Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-image calibration process, we propose to infer the camera calibration parameters such as pitch, roll, field of view, and lens distortion directly from a single image using a deep convolutional neural network. We train this network using automatically generated samples from a large-scale panorama dataset, yielding competitive accuracy in terms of standard l2 error. However, we argue that minimizing such standard error metrics might not be optimal for many applications. In this work, we investigate human sensitivity to inaccuracies in geometric camera calibration. To this end, we conduct a large-scale human perception study where we ask participants to judge the realism of 3D objects composited with correct and biased camera calibration parameters. Based on this study, we develop a new perceptual measure for camera calibration and demonstrate that our deep calibration network outperforms previous single-image based calibration methods both on standard metrics as well as on this novel perceptual measure. Finally, we demonstrate the use of our calibration network for several applications, including virtual object insertion, image retrieval, and compositing.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 13035-13053, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37186524

RESUMO

Manhattan and Atlanta worlds hold for the structured scenes with only vertical and horizontal dominant directions (DDs). To describe the scenes with additional sloping DDs, a mixture of independent Manhattan worlds seems plausible, but may lead to unaligned and unrelated DDs. By contrast, we propose a novel structural model called Hong Kong world. It is more general than Manhattan and Atlanta worlds since it can represent the environments with slopes, e.g., a city with hilly terrain, a house with sloping roof, and a loft apartment with staircase. Moreover, it is more compact and accurate than a mixture of independent Manhattan worlds by enforcing the orthogonality constraints between not only vertical and horizontal DDs, but also horizontal and sloping DDs. We further leverage the structural regularity of Hong Kong world for the line-based SLAM. Our SLAM method is reliable thanks to three technical novelties. First, we estimate DDs/vanishing points in Hong Kong world in a semi-searching way. We use a new consensus voting strategy for search, instead of traditional branch and bound. This method is the first one that can simultaneously determine the number of DDs, and achieve quasi-global optimality in terms of the number of inliers. Second, we compute the camera pose by exploiting the spatial relations between DDs in Hong Kong world. This method generates concise polynomials, and thus is more accurate and efficient than existing approaches designed for unstructured scenes. Third, we refine the estimated DDs in Hong Kong world by a novel filter-based method. Then we use these refined DDs to optimize the camera poses and 3D lines, leading to higher accuracy and robustness than existing optimization algorithms. In addition, we establish the first dataset of sequential images in Hong Kong world. Experiments showed that our approach outperforms state-of-the-art methods in terms of accuracy and/or efficiency.

3.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1503-1518, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32915727

RESUMO

Image lines projected from parallel 3D lines intersect at a common point called the vanishing point (VP). Manhattan world holds for the scenes with three orthogonal VPs. In Manhattan world, given several lines in a calibrated image, we aim to cluster them by three unknown-but-sought VPs. The VP estimation can be reformulated as computing the rotation between the Manhattan frame and camera frame. To estimate three degrees of freedom (DOF) of this rotation, state-of-the-art methods are based on either data sampling or parameter search. However, they fail to guarantee high accuracy and efficiency simultaneously. In contrast, we propose a set of approaches that hybridize these two strategies. We first constrain two or one DOF of the rotation by two or one sampled image line. Then we search for the remaining one or two DOF based on branch and bound. Our sampling accelerates our search by reducing the search space and simplifying the bound computation. Our search achieves quasi-global optimality. Specifically, it guarantees to retrieve the maximum number of inliers on the condition that two or one DOF is constrained. Our hybridization of two-line sampling and one-DOF search can estimate VPs in real time. Our hybridization of one-line sampling and two-DOF search can estimate VPs in near real time. Experiments on both synthetic and real-world datasets demonstrated that our approaches outperform state-of-the-art methods in terms of accuracy and/or efficiency.


Assuntos
Algoritmos , Rotação
4.
Artigo em Inglês | MEDLINE | ID: mdl-32396090

RESUMO

Estimating the absolute camera pose requires 3D-to-2D correspondences of points and/or lines. However, in practice, these correspondences are inevitably corrupted by outliers, which affects the pose estimation. Existing outlier removal strategies for robust pose estimation have some limitations. They are only applicable to points, rely on prior pose information, or fail to handle high outlier ratios. By contrast, we propose a general and accurate outlier removal strategy. It can be integrated with various existing pose estimation methods originally vulnerable to outliers, and is applicable to points, lines, and the combination of both. Moreover, it does not rely on any prior pose information. Our strategy has a nested structure composed of the outer and inner modules. First, our outer module leverages our intersection constraint, i.e., the projection rays or planes defined by inliers intersect at the camera center. Our outer module alternately computes the inlier probabilities of correspondences and estimates the camera pose. It can run reliably and efficiently under high outlier ratios. Second, our inner module exploits our flow consensus. The 2D displacement vectors or 3D directed arcs generated by inliers exhibit a common directional regularity, i.e., follow a dominant trend of flow. Our inner module refines the inlier probabilities obtained at each iteration of our outer module. This refinement improves the accuracy and facilitates the convergence of our outer module. Experiments on both synthetic data and real-world images have shown that our method outperforms state-of-the-art approaches in terms of accuracy and robustness.

5.
IEEE Trans Pattern Anal Mach Intell ; 42(10): 2656-2669, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-30969915

RESUMO

In this work, we describe man-made structures via an appropriate structure assumption, called the Atlanta world assumption, which contains a vertical direction (typically the gravity direction) and a set of horizontal directions orthogonal to the vertical direction. Contrary to the commonly used Manhattan world assumption, the horizontal directions in Atlanta world are not necessarily orthogonal to each other. While Atlanta world can encompass a wider range of scenes, this makes the search space much larger and the problem more challenging. Our input data is a set of surface normals, for example, acquired from RGB-D cameras or 3D laser scanners, as well as lines from calibrated images. Given this input data, we propose the first globally optimal method of inlier set maximization for Atlanta direction estimation. We define a novel search space for Atlanta world, as well as its parametrization, and solve this challenging problem using a branch-and-bound (BnB) framework. To alleviate the computational bottleneck in BnB, i.e., the bound computation, we present two bound computation strategies: rectangular bound and slice bound in an efficient measurement domain, i.e., the extended Gaussian image (EGI). In addition, we propose an efficient two-stage method which automatically estimates the number of horizontal directions of a scene. Experimental results with synthetic and real-world datasets have successfully confirmed the validity of our approach.

6.
IEEE Trans Pattern Anal Mach Intell ; 38(4): 744-58, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26353362

RESUMO

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g., high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.

7.
IEEE Trans Pattern Anal Mach Intell ; 35(7): 1565-76, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23681987

RESUMO

Data correspondence/grouping under an unknown parametric model is a fundamental topic in computer vision. Finding feature correspondences between two images is probably the most popular application of this research field, and is the main motivation of our work. It is a key ingredient for a wide range of vision tasks, including three-dimensional reconstruction and object recognition. Existing feature correspondence methods are based on either local appearance similarity or global geometric consistency or a combination of both in some heuristic manner. None of these methods is fully satisfactory, especially in the presence of repetitive image textures or mismatches. In this paper, we present a new algorithm that combines the benefits of both appearance-based and geometry-based methods and mathematically guarantees a global optimization. Our algorithm accepts the two sets of features extracted from two images as input, and outputs the feature correspondences with the largest number of inliers, which verify both the appearance similarity and geometric constraints. Specifically, we formulate the problem as a mixed integer program and solve it efficiently by a series of linear programs via a branch-and-bound procedure. We subsequently generalize our framework in the context of data correspondence/grouping under an unknown parametric model and show it can be applied to certain classes of computer vision problems. Our algorithm has been validated successfully on synthesized data and challenging real images.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...