RESUMO
Deep neural networks (DNNs) have been widely used for mesh processing in recent years. However, current DNNs can not process arbitrary meshes efficiently. On the one hand, most DNNs expect 2-manifold, watertight meshes, but many meshes, whether manually designed or automatically generated, may have gaps, non-manifold geometry, or other defects. On the other hand, the irregular structure of meshes also brings challenges to building hierarchical structures and aggregating local geometric information, which is critical to conduct DNNs. In this paper, we present DGNet, an efficient, effective and generic deep neural mesh processing network based on dual graph pyramids; it can handle arbitrary meshes. Firstly, we construct dual graph pyramids for meshes to guide feature propagation between hierarchical levels for both downsampling and upsampling. Secondly, we propose a novel convolution to aggregate local features on the proposed hierarchical graphs. By utilizing both geodesic neighbors and Euclidean neighbors, the network enables feature aggregation both within local surface patches and between isolated mesh components. Experimental results demonstrate that DGNet can be applied to both shape analysis and large-scale scene understanding. Furthermore, it achieves superior performance on various benchmarks, including ShapeNetCore, HumanBody, ScanNet and Matterport3D. Code and models will be available at https://github.com/li-xl/DGNet.
RESUMO
Objects in aerial images show greater variations in scale and orientation than in other images, making them harder to detect using vanilla deep convolutional neural networks. Networks with sampling equivariance can adapt sampling from input feature maps to object transformation, allowing a convolutional kernel to extract effective object features under different transformations. However, methods such as deformable convolutional networks can only provide sampling equivariance under certain circumstances, as they sample by location. We propose sampling equivariant self-attention networks, which treat self-attention restricted to a local image patch as convolution sampling by masks instead of locations, and a transformation embedding module to improve the equivariant sampling further. We further propose a novel randomized normalization module to enhance network generalization and a quantitative evaluation metric to fairly evaluate the ability of sampling equivariance of different models. Experiments show that our model provides significantly better sampling equivariance than existing methods without additional supervision and can thus extract more effective image features. Our model achieves state-of-the-art results on the DOTA-v1.0, DOTA-v1.5, and HRSC2016 datasets without additional computations or parameters.
RESUMO
Accurate camera pose estimation is essential and challenging for real world dynamic 3D reconstruction and augmented reality applications. In this article, we present a novel RGB-D SLAM approach for accurate camera pose tracking in dynamic environments. Previous methods detect dynamic components only across a short time-span of consecutive frames. Instead, we provide a more accurate dynamic 3D landmark detection method, followed by the use of long-term consistency via conditional random fields, which leverages long-term observations from multiple frames. Specifically, we first introduce an efficient initial camera pose estimation method based on distinguishing dynamic from static points using graph-cut RANSAC. These static/dynamic labels are used as priors for the unary potential in the conditional random fields, which further improves the accuracy of dynamic 3D landmark detection. Evaluation using the TUM and Bonn RGB-D dynamic datasets shows that our approach significantly outperforms state-of-the-art methods, providing much more accurate camera trajectory estimation in a variety of highly dynamic environments. We also show that dynamic 3D reconstruction can benefit from the camera poses estimated by our RGB-D SLAM approach.
RESUMO
An algorithm is presented to automatically generate bas-reliefs based on adaptive histogram equalization (AHE), starting from an input height field. A mesh model may alternatively be provided, in which case a height field is first created via orthogonal or perspective projection. The height field is regularly gridded and treated as an image, enabling a modified AHE method to be used to generate a bas-relief with a user-chosen height range. We modify the original image-contrast-enhancement AHE method to use gradient weights also to enhance the shape features of the bas-relief. To effectively compress the height field, we limit the height-dependent scaling factors used to compute relative height variations in the output from height variations in the input; this prevents any height differences from having too great effect. Results of AHE over different neighborhood sizes are averaged to preserve information at different scales in the resulting bas-relief. Compared to previous approaches, the proposed algorithm is simple and yet largely preserves original shape features. Experiments show that our results are, in general, comparable to and in some cases better than the best previously published methods.
RESUMO
We present a system for vectorizing 2D raster format cartoon animations. The output animations are visually flicker free, smaller in file size, and easy to edit. We identify decorative lines separately from colored regions. We use an accurate and semantically meaningful image decomposition algorithm, supporting an arbitrary color model for each region. To ensure temporal coherence in the output, we reconstruct a universal background for all frames and separately extract foreground regions. Simple user-assistance is required to complete the background. Each region and decorative line is vectorized and stored together with their motions from frame to frame. The contributions of this paper are: 1) the new trapped-ball segmentation method, which is fast, supports nonuniformly colored regions, and allows robust region segmentation even in the presence of imperfectly linked region edges, 2) the separate handling of decorative lines as special objects during image decomposition, avoiding results containing multiple short, thin oversegmented regions, and 3) extraction of a single patch-based background for all frames, which provides a basis for consistent, flicker-free animations.
RESUMO
A quick-response code (QR code) is a two-dimensional code akin to a barcode that encodes a message of limited length. In this paper, we present a variant of QR code, a two-layer QR code. Its two-layer structure can display two alternative messages when scanned from two different directions. We propose a method to generate such two-layer QR codes encoding two given messages in a few seconds. We also demonstrate the robustness of our method on both synthetic and fabricated examples. All source code will be made publicly available (https://github.com/yuantailing/two-layer-qrcode).
RESUMO
In this paper, we consider delayed bidirectional associative memory (BAM) neural networks (NNs) with Lipschitz continuous activation functions. By applying Young's inequality and Hoelder's inequality techniques together with the properties of monotonic continuous functions, global exponential stability criteria are established for BAM NNs with time delays. This is done through the use of a new Lyapunov functional and an M-matrix. The results obtained in this paper extend and improve previous results.
Assuntos
Memória , Redes Neurais de Computação , Dinâmica não Linear , Algoritmos , Humanos , Fatores de TempoRESUMO
This paper presents a novel skeleton-based method for deforming meshes (using an approximate skeleton, rather than a precise medial axis). The significant difference from previous skeleton-based methods is that the latter use the skeleton to control movement of vertices whereas we use it to control the simplices defining the model. By doing so, errors that occur near joints in other methods can be spread over the whole mesh, using an optimization process, resulting in smooth transitions near joints of the skeleton. By controlling simplices, our method has the advantage that no vertex weights need to be defined on the bones, which is a tedious requirement in previous skeleton-based methods. Our method can also easily be extended to control deformation by moving a few chosen line segments or vertices embedded in the object, rather than a skeleton.
Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Movimento (Física) , Análise Numérica Assistida por ComputadorRESUMO
Unlike image blending algorithms, video blending algorithms have been little studied. In this paper, we investigate 6 popular blending algorithms-feather blending, multi-band blending, modified Poisson blending, mean value coordinate blending, multi-spline blending and convolution pyramid blending. We consider their application to blending realtime panoramic videos, a key problem in various virtual reality tasks. To evaluate the performances and suitabilities of the 6 algorithms for this problem, we have created a video benchmark with several videos captured under various conditions. We analyze the time and memory needed by the above 6 algorithms, for both CPU and GPU implementations (where readily parallelizable). The visual quality provided by these algorithms is also evaluated both objectively and subjectively. The video benchmark and algorithm implementations are publicly available1.
RESUMO
In this paper, we present a novel pairwise-force smoothed particle hydrodynamics (PF-SPH) model to enable simulation of various interactions at interfaces in real time. Realistic capture of interactions at interfaces is a challenging problem for SPH-based simulations, especially for scenarios involving multiple interactions at different interfaces. Our PF-SPH model can readily handle multiple types of interactions simultaneously in a single simulation; its basis is to use a larger support radius than that used in standard SPH. We adopt a novel anisotropic filtering term to further improve the performance of interaction forces. The proposed model is stable; furthermore, it avoids the particle clustering problem which commonly occurs at the free surface. We show how our model can be used to capture various interactions. We also consider the close connection between droplets and bubbles, and show how to animate bubbles rising in liquid as well as bubbles in air. Our method is versatile, physically plausible and easy-to-implement. Examples are provided to demonstrate the capabilities and effectiveness of our approach.
RESUMO
We present a method for realtime reconstruction of an animating human body,which produces a sequence of deforming meshes representing a given performance captured by a single commodity depth camera. We achieve realtime single-view mesh completion by enhancing the parameterized SCAPE model.Our method, which we call Realtime SCAPE, performs full-body reconstruction without the use of markers.In Realtime SCAPE, estimations of body shape parameters and pose parameters, needed for reconstruction, are decoupled. Intrinsic body shape is first precomputed for a given subject, by determining shape parameters with the aid of a body shape database. Subsequently, per-frame pose parameter estimation is performed by means of linear blending skinning (LBS); the problem is decomposed into separately finding skinning weights and transformations. The skinning weights are also determined offline from the body shape database,reducing online reconstruction to simply finding the transformations in LBS. Doing so is formulated as a linear variational problem;carefully designed constraints are used to impose temporal coherence and alleviate artifacts. Experiments demonstrate that our method can produce full-body mesh sequences with high fidelity.
Assuntos
Gráficos por Computador , Corpo Humano , HumanosRESUMO
In this paper, we present a novel algorithm to simultaneously accomplish color quantization and dithering of images. This is achieved by minimizing a perception-based cost function, which considers pixel-wise differences between filtered versions of the quantized image and the input image. We use edge aware filters in defining the cost function to avoid mixing colors on the opposite sides of an edge. The importance of each pixel is weighted according to its saliency. To rapidly minimize the cost function, we use a modified multi-scale iterative conditional mode (ICM) algorithm, which updates one pixel a time while keeping other pixels unchanged. As ICM is a local method, careful initialization is required to prevent termination at a local minimum far from the global one. To address this problem, we initialize ICM with a palette generated by a modified median-cut method. Compared with previous approaches, our method can produce high-quality results with a fewer visual artifacts but also requires significantly less computational effort.
RESUMO
A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: 1) stabilizing video content over short time scales; 2) ensuring simple and consistent camera paths over longer time scales; and 3) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L(1) camera path optimization framework, extended to handle multiple constraints. Two passes of optimization are used to address both low-level and high-level constraints on the camera path. The experimental and user study results show that our approach outputs video that is perceptually better than the input, or the results of using stabilization only.
RESUMO
Feature extraction and matching (FEM) for 3D shapes finds numerous applications in computer graphics and vision for object modeling, retrieval, morphing, and recognition. However, unavoidable incorrect matches lead to inaccurate estimation of the transformation relating different datasets. Inspired by AdaBoost, this paper proposes a novel iterative re-weighting method to tackle the challenging problem of evaluating point matches established by typical FEM methods. Weights are used to indicate the degree of belief that each point match is correct. Our method has three key steps: (i) estimation of the underlying transformation using weighted least squares, (ii) penalty parameter estimation via minimization of the weighted variance of the matching errors, and (iii) weight re-estimation taking into account both matching errors and information learnt in previous iterations. A comparative study, based on real shapes captured by two laser scanners, shows that the proposed method outperforms four other state-of-the-art methods in terms of evaluating point matches between overlapping shapes established by two typical FEM methods, resulting in more accurate estimates of the underlying transformation. This improved transformation can be used to better initialize the iterative closest point algorithm and its variants, making 3D shape registration more likely to succeed.
RESUMO
Solid textures, comprising 3D particles embedded in a matrix in a regular or semiregular pattern, are common in natural and man-made materials, such as brickwork, stone walls, plant cells in a leaf, etc. We present a novel technique for synthesizing such textures, starting from 2D image exemplars which provide cross-sections of the desired volume texture. The shapes and colors of typical particles embedded in the structure are estimated from their 2D cross-sections. Particle positions in the texture images are also used to guide spatial placement of the 3D particles during synthesis of the 3D texture. Our experiments demonstrate that our algorithm can produce higher quality structures than previous approaches; they are both compatible with the input images, and have a plausible 3D nature.
Assuntos
Algoritmos , Gráficos por Computador , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-ComputadorRESUMO
Traditional image editing techniques cannot be directly used to edit stereoscopic ("3D") media, as extra constraints are needed to ensure consistent changes are made to both left and right images. Here, we consider manipulating perspective in stereoscopic pairs. A straightforward approach based on depth recovery is unsatisfactory: Instead, we use feature correspondences between stereoscopic image pairs. Given a new, user-specified perspective, we determine correspondence constraints under this perspective and optimize a 2D warp for each image that preserves straight lines and guarantees proper stereopsis relative to the new camera. Experiments verify that our method generates new stereoscopic views that correspond well to expected projections, for a wide range of specified perspective. Various advanced camera effects, such as dolly zoom and wide angle effects, can also be readily generated for stereoscopic image pairs using our method.
RESUMO
Feature matching is a challenging problem at the heart of numerous computer graphics and computer vision applications. We present the SuperMatching algorithm for finding correspondences between two sets of features. It does so by considering triples or higher order tuples of points, going beyond the pointwise and pairwise approaches typically used. SuperMatching is formulated using a supersymmetric tensor representing an affinity metric that takes into account feature similarity and geometric constraints between features: Feature matching is cast as a higher order graph matching problem. SuperMatching takes advantage of supersymmetry to devise an efficient sampling strategy to estimate the affinity tensor, as well as to store the estimated tensor compactly. Matching is performed by an efficient higher order power iteration approach that takes advantage of this compact representation. Experiments on both synthetic and real data show that SuperMatching provides more accurate feature matching than other state-of-the-art approaches for a wide range of 2D and 3D features, with competitive computational cost.
RESUMO
We introduce a novel stratified sampling technique for mesh surfaces that gives the user control over sampling density and anisotropy via a tensor field. Our approach is based on sampling space-filling curves mapped onto mesh segments via parametrizations aligned with the tensor field. After a short preprocessing step, samples can be generated in real time. Along with visual examples, we provide rigorous spectral analysis and differential domain analysis of our sampling. The sample distributions are of high quality: they fulfil the blue noise criterion, so have minimal artifacts due to regularity of sampling patterns, and they accurately represent isotropic and anisotropic densities on the plane and on mesh surfaces. They also have low discrepancy, ensuring that the surface is evenly covered.
RESUMO
We present a video editing technique based on changing the timelines of individual objects in video, which leaves them in their original places but puts them at different times. This allows the production of object-level slow motion effects, fast motion effects, or even time reversal. This is more flexible than simply applying such effects to whole frames, as new relationships between objects can be created. As we restrict object interactions to the same spatial locations as in the original video, our approach can produce highquality results using only coarse matting of video objects. Coarse matting can be done efficiently using automatic video object segmentation, avoiding tedious manual matting. To design the output, the user interactively indicates the desired new life spans of objects, and may also change the overall running time of the video. Our method rearranges the timelines of objects in the video whilst applying appropriate object interaction constraints. We demonstrate that, while this editing technique is somewhat restrictive, it still allows many interesting results.
RESUMO
This paper presents a novel approach to edge-aware image manipulation. Our method processes a Gaussian pyramid from coarse to fine, and at each level, applies a nonlinear filter bank to the neighborhood of each pixel. Outputs of these spatially-varying filters are merged using global optimization. The optimization problem is solved using an explicit mixed-domain (real space and DCT transform space) solution, which is efficient, accurate, and easy-to-implement. We demonstrate applications of our method to a set of problems, including detail and contrast manipulation, HDR compression, nonphotorealistic rendering, and haze removal.