Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Image Process ; 32: 6413-6425, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37906473

RESUMO

Objects in aerial images show greater variations in scale and orientation than in other images, making them harder to detect using vanilla deep convolutional neural networks. Networks with sampling equivariance can adapt sampling from input feature maps to object transformation, allowing a convolutional kernel to extract effective object features under different transformations. However, methods such as deformable convolutional networks can only provide sampling equivariance under certain circumstances, as they sample by location. We propose sampling equivariant self-attention networks, which treat self-attention restricted to a local image patch as convolution sampling by masks instead of locations, and a transformation embedding module to improve the equivariant sampling further. We further propose a novel randomized normalization module to enhance network generalization and a quantitative evaluation metric to fairly evaluate the ability of sampling equivariance of different models. Experiments show that our model provides significantly better sampling equivariance than existing methods without additional supervision and can thus extract more effective image features. Our model achieves state-of-the-art results on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets without additional computations or parameters.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37028344

RESUMO

Deep neural networks (DNNs) have been widely used for mesh processing in recent years. However, current DNNs can not process arbitrary meshes efficiently. On the one hand, most DNNs expect 2-manifold, watertight meshes, but many meshes, whether manually designed or automatically generated, may have gaps, non-manifold geometry, or other defects. On the other hand, the irregular structure of meshes also brings challenges to building hierarchical structures and aggregating local geometric information, which is critical to conduct DNNs. In this paper, we present DGNet, an efficient, effective and generic deep neural mesh processing network based on dual graph pyramids; it can handle arbitrary meshes. Firstly, we construct dual graph pyramids for meshes to guide feature propagation between hierarchical levels for both downsampling and upsampling. Secondly, we propose a novel convolution to aggregate local features on the proposed hierarchical graphs. By utilizing both geodesic neighbors and Euclidean neighbors, the network enables feature aggregation both within local surface patches and between isolated mesh components. Experimental results demonstrate that DGNet can be applied to both shape analysis and large-scale scene understanding. Furthermore, it achieves superior performance on various benchmarks, including ShapeNetCore, HumanBody, ScanNet and Matterport3D. Code and models will be available at https://github.com/li-xl/DGNet.

3.
IEEE Trans Vis Comput Graph ; 28(4): 1745-1757, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33001804

RESUMO

Accurate camera pose estimation is essential and challenging for real world dynamic 3D reconstruction and augmented reality applications. In this article, we present a novel RGB-D SLAM approach for accurate camera pose tracking in dynamic environments. Previous methods detect dynamic components only across a short time-span of consecutive frames. Instead, we provide a more accurate dynamic 3D landmark detection method, followed by the use of long-term consistency via conditional random fields, which leverages long-term observations from multiple frames. Specifically, we first introduce an efficient initial camera pose estimation method based on distinguishing dynamic from static points using graph-cut RANSAC. These static/dynamic labels are used as priors for the unary potential in the conditional random fields, which further improves the accuracy of dynamic 3D landmark detection. Evaluation using the TUM and Bonn RGB-D dynamic datasets shows that our approach significantly outperforms state-of-the-art methods, providing much more accurate camera trajectory estimation in a variety of highly dynamic environments. We also show that dynamic 3D reconstruction can benefit from the camera poses estimated by our RGB-D SLAM approach.

4.
IEEE Trans Image Process ; 28(9): 4413-4428, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31071029

RESUMO

A quick-response code (QR code) is a two-dimensional code akin to a barcode that encodes a message of limited length. In this paper, we present a variant of QR code, a two-layer QR code. Its two-layer structure can display two alternative messages when scanned from two different directions. We propose a method to generate such two-layer QR codes encoding two given messages in a few seconds. We also demonstrate the robustness of our method on both synthetic and fabricated examples. All source code will be made publicly available (https://github.com/yuantailing/two-layer-qrcode).

5.
IEEE Trans Image Process ; 27(6): 2952-2965, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29993600

RESUMO

Unlike image blending algorithms, video blending algorithms have been little studied. In this paper, we investigate 6 popular blending algorithms-feather blending, multi-band blending, modified Poisson blending, mean value coordinate blending, multi-spline blending and convolution pyramid blending. We consider their application to blending realtime panoramic videos, a key problem in various virtual reality tasks. To evaluate the performances and suitabilities of the 6 algorithms for this problem, we have created a video benchmark with several videos captured under various conditions. We analyze the time and memory needed by the above 6 algorithms, for both CPU and GPU implementations (where readily parallelizable). The visual quality provided by these algorithms is also evaluated both objectively and subjectively. The video benchmark and algorithm implementations are publicly available1.

6.
IEEE Trans Vis Comput Graph ; 23(10): 2235-2247, 2017 10.
Artigo em Inglês | MEDLINE | ID: mdl-28541209

RESUMO

In this paper, we present a novel pairwise-force smoothed particle hydrodynamics (PF-SPH) model to enable simulation of various interactions at interfaces in real time. Realistic capture of interactions at interfaces is a challenging problem for SPH-based simulations, especially for scenarios involving multiple interactions at different interfaces. Our PF-SPH model can readily handle multiple types of interactions simultaneously in a single simulation; its basis is to use a larger support radius than that used in standard SPH. We adopt a novel anisotropic filtering term to further improve the performance of interaction forces. The proposed model is stable; furthermore, it avoids the particle clustering problem which commonly occurs at the free surface. We show how our model can be used to capture various interactions. We also consider the close connection between droplets and bubbles, and show how to animate bubbles rising in liquid as well as bubbles in air. Our method is versatile, physically plausible and easy-to-implement. Examples are provided to demonstrate the capabilities and effectiveness of our approach.

7.
IEEE Trans Image Process ; 25(3): 1152-62, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26731765

RESUMO

In this paper, we present a novel algorithm to simultaneously accomplish color quantization and dithering of images. This is achieved by minimizing a perception-based cost function, which considers pixel-wise differences between filtered versions of the quantized image and the input image. We use edge aware filters in defining the cost function to avoid mixing colors on the opposite sides of an edge. The importance of each pixel is weighted according to its saliency. To rapidly minimize the cost function, we use a modified multi-scale iterative conditional mode (ICM) algorithm, which updates one pixel a time while keeping other pixels unchanged. As ICM is a local method, careful initialization is required to prevent termination at a local minimum far from the global one. To address this problem, we initialize ICM with a palette generated by a modified median-cut method. Compared with previous approaches, our method can produce high-quality results with a fewer visual artifacts but also requires significantly less computational effort.

8.
IEEE Trans Vis Comput Graph ; 22(8): 2000-11, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26390493

RESUMO

We present a method for realtime reconstruction of an animating human body,which produces a sequence of deforming meshes representing a given performance captured by a single commodity depth camera. We achieve realtime single-view mesh completion by enhancing the parameterized SCAPE model.Our method, which we call Realtime SCAPE, performs full-body reconstruction without the use of markers.In Realtime SCAPE, estimations of body shape parameters and pose parameters, needed for reconstruction, are decoupled. Intrinsic body shape is first precomputed for a given subject, by determining shape parameters with the aid of a body shape database. Subsequently, per-frame pose parameter estimation is performed by means of linear blending skinning (LBS); the problem is decomposed into separately finding skinning weights and transformations. The skinning weights are also determined offline from the body shape database,reducing online reconstruction to simply finding the transformations in LBS. Doing so is formulated as a linear variational problem;carefully designed constraints are used to impose temporal coherence and alleviate artifacts. Experiments demonstrate that our method can produce full-body mesh sequences with high fidelity.


Assuntos
Gráficos por Computador , Corpo Humano , Humanos
9.
IEEE Trans Image Process ; 24(12): 5982-94, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26513791

RESUMO

A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: 1) stabilizing video content over short time scales; 2) ensuring simple and consistent camera paths over longer time scales; and 3) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L(1) camera path optimization framework, extended to handle multiple constraints. Two passes of optimization are used to address both low-level and high-level constraints on the camera path. The experimental and user study results show that our approach outputs video that is perceptually better than the input, or the results of using stabilization only.

10.
IEEE Trans Vis Comput Graph ; 21(9): 1058-71, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26357287

RESUMO

Feature extraction and matching (FEM) for 3D shapes finds numerous applications in computer graphics and vision for object modeling, retrieval, morphing, and recognition. However, unavoidable incorrect matches lead to inaccurate estimation of the transformation relating different datasets. Inspired by AdaBoost, this paper proposes a novel iterative re-weighting method to tackle the challenging problem of evaluating point matches established by typical FEM methods. Weights are used to indicate the degree of belief that each point match is correct. Our method has three key steps: (i) estimation of the underlying transformation using weighted least squares, (ii) penalty parameter estimation via minimization of the weighted variance of the matching errors, and (iii) weight re-estimation taking into account both matching errors and information learnt in previous iterations. A comparative study, based on real shapes captured by two laser scanners, shows that the proposed method outperforms four other state-of-the-art methods in terms of evaluating point matches between overlapping shapes established by two typical FEM methods, resulting in more accurate estimates of the underlying transformation. This improved transformation can be used to better initialize the iterative closest point algorithm and its variants, making 3D shape registration more likely to succeed.

11.
IEEE Trans Vis Comput Graph ; 19(11): 1885-94, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24029908

RESUMO

Feature matching is a challenging problem at the heart of numerous computer graphics and computer vision applications. We present the SuperMatching algorithm for finding correspondences between two sets of features. It does so by considering triples or higher order tuples of points, going beyond the pointwise and pairwise approaches typically used. SuperMatching is formulated using a supersymmetric tensor representing an affinity metric that takes into account feature similarity and geometric constraints between features: Feature matching is cast as a higher order graph matching problem. SuperMatching takes advantage of supersymmetry to devise an efficient sampling strategy to estimate the affinity tensor, as well as to store the estimated tensor compactly. Matching is performed by an efficient higher order power iteration approach that takes advantage of this compact representation. Experiments on both synthetic and real data show that SuperMatching provides more accurate feature matching than other state-of-the-art approaches for a wide range of 2D and 3D features, with competitive computational cost.

12.
IEEE Trans Vis Comput Graph ; 19(8): 1288-97, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23744259

RESUMO

Traditional image editing techniques cannot be directly used to edit stereoscopic ("3D") media, as extra constraints are needed to ensure consistent changes are made to both left and right images. Here, we consider manipulating perspective in stereoscopic pairs. A straightforward approach based on depth recovery is unsatisfactory: Instead, we use feature correspondences between stereoscopic image pairs. Given a new, user-specified perspective, we determine correspondence constraints under this perspective and optimize a 2D warp for each image that preserves straight lines and guarantees proper stereopsis relative to the new camera. Experiments verify that our method generates new stereoscopic views that correspond well to expected projections, for a wide range of specified perspective. Various advanced camera effects, such as dolly zoom and wide angle effects, can also be readily generated for stereoscopic image pairs using our method.

13.
IEEE Trans Vis Comput Graph ; 19(7): 1143-57, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23661009

RESUMO

We introduce a novel stratified sampling technique for mesh surfaces that gives the user control over sampling density and anisotropy via a tensor field. Our approach is based on sampling space-filling curves mapped onto mesh segments via parametrizations aligned with the tensor field. After a short preprocessing step, samples can be generated in real time. Along with visual examples, we provide rigorous spectral analysis and differential domain analysis of our sampling. The sample distributions are of high quality: they fulfil the blue noise criterion, so have minimal artifacts due to regularity of sampling patterns, and they accurately represent isotropic and anisotropic densities on the plane and on mesh surfaces. They also have low discrepancy, ensuring that the surface is evenly covered.

14.
IEEE Trans Vis Comput Graph ; 19(7): 1199-217, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23661012

RESUMO

Three-dimensional surface registration transforms multiple three-dimensional data sets into the same coordinate system so as to align overlapping components of these sets. Recent surveys have covered different aspects of either rigid or nonrigid registration, but seldom discuss them as a whole. Our study serves two purposes: 1) To give a comprehensive survey of both types of registration, focusing on three-dimensional point clouds and meshes and 2) to provide a better understanding of registration from the perspective of data fitting. Registration is closely related to data fitting in which it comprises three core interwoven components: model selection, correspondences and constraints, and optimization. Study of these components 1) provides a basis for comparison of the novelties of different techniques, 2) reveals the similarity of rigid and nonrigid registration in terms of problem representations, and 3) shows how overfitting arises in nonrigid registration and the reasons for increasing interest in intrinsic techniques. We further summarize some practical issues of registration which include initializations and evaluations, and discuss some of our own observations, insights and foreseeable research trends.

15.
IEEE Trans Image Process ; 22(5): 1915-25, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23303694

RESUMO

This paper presents a novel approach to edge-aware image manipulation. Our method processes a Gaussian pyramid from coarse to fine, and at each level, applies a nonlinear filter bank to the neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization. The optimization problem is solved using an explicit mixed-domain (real space and DCT transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate applications of our method to a set of problems, including detail and contrast manipulation, HDR compression, nonphotorealistic rendering, and haze removal.

16.
IEEE Trans Vis Comput Graph ; 19(3): 460-9, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22614330

RESUMO

Solid textures, comprising 3D particles embedded in a matrix in a regular or semiregular pattern, are common in natural and man-made materials, such as brickwork, stone walls, plant cells in a leaf, etc. We present a novel technique for synthesizing such textures, starting from 2D image exemplars which provide cross-sections of the desired volume texture. The shapes and colors of typical particles embedded in the structure are estimated from their 2D cross-sections. Particle positions in the texture images are also used to guide spatial placement of the 3D particles during synthesis of the 3D texture. Our experiments demonstrate that our algorithm can produce higher quality structures than previous approaches; they are both compatible with the input images, and have a plausible 3D nature.


Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador
17.
IEEE Trans Vis Comput Graph ; 19(7): 1218-27, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22732683

RESUMO

We present a video editing technique based on changing the timelines of individual objects in video, which leaves them in their original places but puts them at different times. This allows the production of object-level slow motion effects, fast motion effects, or even time reversal. This is more flexible than simply applying such effects to whole frames, as new relationships between objects can be created. As we restrict object interactions to the same spatial locations as in the original video, our approach can produce highquality results using only coarse matting of video objects. Coarse matting can be done efficiently using automatic video object segmentation, avoiding tedious manual matting. To design the output, the user interactively indicates the desired new life spans of objects, and may also change the overall running time of the video. Our method rearranges the timelines of objects in the video whilst applying appropriate object interaction constraints. We demonstrate that, while this editing technique is somewhat restrictive, it still allows many interesting results.

18.
IEEE Trans Vis Comput Graph ; 18(10): 1771-83, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21788670

RESUMO

Various types of video can be captured with fisheye lenses; their wide field of view is particularly suited to surveillance video. However, fisheye lenses introduce distortion, and this changes as objects in the scene move, making fisheye video difficult to interpret. Current still fisheye image correction methods are either limited to small angles of view, or are strongly content dependent, and therefore unsuitable for processing video streams. We present an efficient and robust scheme for fisheye video correction, which minimizes time-varying distortion and preserves salient content in a coherent manner. Our optimization process is controlled by user annotation, and takes into account a wide set of measures addressing different aspects of natural scene appearance. Each is represented as a quadratic term in an energy minimization problem, leading to a closed-form solution via a sparse linear system. We illustrate our method with a range of examples, demonstrating coherent natural-looking video output. The visual quality of individual frames is comparable to those produced by state-of-the-art methods for fisheye still photograph correction.

19.
IEEE Trans Syst Man Cybern B Cybern ; 41(3): 749-60, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21134817

RESUMO

Cellular automata (CA) with given evolution rules have been widely investigated, but the inverse problem of extracting CA rules from observed data is less studied. Current CA rule extraction approaches are both time consuming and inefficient when selecting neighborhoods. We give a novel approach to identifying CA rules from observed data and selecting CA neighborhoods based on the identified CA model. Our identification algorithm uses a model linear in its parameters and gives a unified framework for representing the identification problem for both deterministic and probabilistic CA. Parameters are estimated based on a minimum variance criterion. An incremental procedure is applied during CA identification to select an initial coarse neighborhood. Redundant cells in the neighborhood are then removed based on parameter estimates, and the neighborhood size is determined using the Bayesian information criterion. Experimental results show the effectiveness of our algorithm and that it outperforms other leading CA identification algorithms.


Assuntos
Algoritmos , Inteligência Artificial , Técnicas de Apoio para a Decisão , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador
20.
IEEE Trans Vis Comput Graph ; 15(4): 642-53, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19423888

RESUMO

An algorithm is presented to automatically generate bas-reliefs based on adaptive histogram equalization (AHE), starting from an input height field. A mesh model may alternatively be provided, in which case a height field is first created via orthogonal or perspective projection. The height field is regularly gridded and treated as an image, enabling a modified AHE method to be used to generate a bas-relief with a user-chosen height range. We modify the original image-contrast-enhancement AHE method to use gradient weights also to enhance the shape features of the bas-relief. To effectively compress the height field, we limit the height-dependent scaling factors used to compute relative height variations in the output from height variations in the input; this prevents any height differences from having too great effect. Results of AHE over different neighborhood sizes are averaged to preserve information at different scales in the resulting bas-relief. Compared to previous approaches, the proposed algorithm is simple and yet largely preserves original shape features. Experiments show that our results are, in general, comparable to and in some cases better than the best previously published methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...