Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-39008393

RESUMEN

Many 3D mesh processing tasks revolve around generating and manipulating curves on surface meshes. While it is intuitive to explicitly model these curves using mesh edges or parametric curves in the ambient space, these methods often suffer from numerical instability or inaccuracy due to the projection operation. Another natural strategy is to adapt spline based tools, these methods are quite fast but are hard to be extended to more versatile constraints and need heavy manual interactions. In this paper, we present an efficient and versatile approach to curve design based on an implicit representation known as the level set. While previous works have explored the use of the level set to generate curves with minimal length, they typically have limitations in accommodating additional conditions for rich and robust control. To address these challenges, we formulate curve editing with constraints like smoothness, interpolation, tangent control, etc., via a level set based variational problem by constraining the values or derivatives of the level set function. However, the widely used gradient flow strategy converges very slowly for this complicated variational problem compared to the classical geodesic one. Thus, we propose to solve it via Newton's method enhanced by local Hessian correction and a trust-region strategy. As a result, our method not only enables versatile control, but also excels in terms of performance due to nearly quadratic convergence and almost linear complexity in each iteration via narrow band acceleration. In practice, these advantages effectively benefit various applications, such as interactive curve manipulation, boundary smoothing for surface segmentation and path planning with obstacles as demonstrated.

2.
Artículo en Inglés | MEDLINE | ID: mdl-38656855

RESUMEN

We present a novel framework named NeuralRecon for real-time 3D scene reconstruction from a monocular video. Unlike previous methods that estimate single-view depth maps separately on each key-frame and fuse them later, we propose to directly reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network. A learning-based TSDF fusion module based on gated recurrent units is used to guide the network to fuse features from previous fragments. This design allows the network to capture local smoothness prior and global shape prior of 3D surfaces when sequentially reconstructing the surfaces, resulting in accurate, coherent, and real-time surface reconstruction. The fused features can also be used to predict semantic labels, allowing our method to reconstruct and segment the 3D scene simultaneously. Furthermore, we purpose an efficient self-supervised fine-tuning scheme that refines scene geometry based on input images through differentiable volume rendering. This fine-tuning scheme improves reconstruction quality on the fine-tuned scenes as well as the generalization to similar test scenes. The experiments on ScanNet, 7-Scenes and Replica datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed.

3.
IEEE Trans Vis Comput Graph ; 30(5): 2098-2108, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38437081

RESUMEN

Visual-inertial SLAM (VI-SLAM) is a key technology for Augmented Reality (AR), which allows the AR device to recover its 6-DoF motion in real-time in order to render the virtual content with the corresponding pose. Nowadays, smartphones are still the mainstream devices for ordinary users to experience AR. However the current VI-SLAM methods, although performing well on high-end phones, still face robustness challenges when deployed on a larger stock of mid- and low-end phones. Existing VI-SLAM datasets use either very ideal sensors or only a limited number of devices for data collection, which cannot reflect the capability gaps that VI-SLAM methods need to solve when deployed on a large variety of phone models. This work proposes 100-Phones. the first VI-SLAM dataset covering a wide range of mainstream phones in the market. The dataset consists of 350 sequences collected by 100 different models of phones. Through analysis and experiments on the collected data, we conclude that the quality of visual-inertial data vary greatly among the mainstream phones, and the current open source VI-SLAM methods still have serious robustness issues when it comes to mass deployment on mobile phones. We release the dataset to facilitate the robustness improvement of VI-SLAM and to promote the mass popularization of AR. Project page: https://github.com/zju3dv/100-Phones.

4.
Artículo en Inglés | MEDLINE | ID: mdl-38507384

RESUMEN

This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Many previous works have shown impressive reconstruction results on textured objects, but they still have difficulty in handling low-textured planar regions, which are common in indoor scenes. An approach to solving this issue is to incorporate planar constraints into the depth map estimation in multi-view stereo-based methods, but the per-view plane estimation and depth optimization lack both efficiency and multi-view consistency. In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. Specifically, we use an MLP network to represent the signed distance function as the scene geometry. Based on the Manhattan-world assumption and the Atlanta-world assumption, planar constraints are employed to regularize the geometry in floor and wall regions predicted by a 2D semantic segmentation network. To resolve the inaccurate segmentation, we encode the semantics of 3D points with another MLP and design a novel loss that jointly optimizes the scene geometry and semantics in 3D space. Experiments on ScanNet and 7-Scenes datasets show that the proposed method outperforms previous methods by a large margin on 3D reconstruction quality. The code and supplementary materials are available at https://zju3dv.github.io/ manhattan sdf.

5.
IEEE Trans Pattern Anal Mach Intell ; 46(6): 4147-4159, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38231799

RESUMEN

This paper addresses the challenge of reconstructing an animatable human model from a multi-view video. Some recent works have proposed to decompose a non-rigidly deforming scene into a canonical neural radiance field and a set of deformation fields that map observation-space points to the canonical space, thereby enabling them to learn the dynamic scene from images. However, they represent the deformation field as translational vector field or SE(3) field, which makes the optimization highly under-constrained. Moreover, these representations cannot be explicitly controlled by input motions. Instead, we introduce blend weight fields to produce the deformation fields. Based on the skeleton-driven deformation, blend weight fields are used with 3D human skeletons to generate observation-to-canonical and canonical-to-observation correspondences. Since 3D human skeletons are more observable, they can regularize the learning of deformation fields. Moreover, the blend weight fields can be combined with input skeletal motions to generate new deformation fields to animate the human model. To improve the quality of human modeling, we further represent the human geometry as a signed distance field in the canonical space. Additionally, a neural point displacement field is introduced to enhance the capability of the blend weight field on modeling detailed human motions. Experiments show that our approach significantly outperforms recent human modeling methods.

6.
Artículo en Inglés | MEDLINE | ID: mdl-38215333

RESUMEN

It is typically challenging for visual or visual-inertial odometry systems to handle the problems of dynamic scenes and pure rotation. In this work, we design a novel visual-inertial odometry (VIO) system called RD-VIO to handle both of these two problems. Firstly, we propose an IMU-PARSAC algorithm which can robustly detect and match keypoints in a two-stage process. In the first state, landmarks are matched with new keypoints using visual and IMU measurements. We collect statistical information from the matching and then guide the intra-keypoint matching in the second stage. Secondly, to handle the problem of pure rotation, we detect the motion type and adapt the deferred-triangulation technique during the data-association process. We make the pure-rotational frames into the special subframes. When solving the visual-inertial bundle adjustment, they provide additional constraints to the pure-rotational motion. We evaluate the proposed VIO system on public datasets and online comparison. Experiments show the proposed RD-VIO has obvious advantages over other methods in dynamic environments.

7.
Artículo en Inglés | MEDLINE | ID: mdl-37294654

RESUMEN

Although part-based motion synthesis networks have been investigated to reduce the complexity of modeling heterogeneous human motions, their computational cost remains prohibitive in interactive applications. To this end, we propose a novel two-part transformer network that aims to achieve high-quality, controllable motion synthesis results in real-time. Our network separates the skeleton into the upper and lower body parts, reducing the expensive cross-part fusion operations, and models the motions of each part separately through two streams of auto-regressive modules formed by multi-head attention layers. However, such a design might not sufficiently capture the correlations between the parts. We thus intentionally let the two parts share the features of the root joint and design a consistency loss to penalize the difference in the estimated root features and motions by these two auto-regressive modules, significantly improving the quality of synthesized motions. After training on our motion dataset, our network can synthesize a wide range of heterogeneous motions, like cartwheels and twists. Experimental and user study results demonstrate that our network is superior to state-of-the-art human motion synthesis networks in the quality of generated motions.

8.
Nat Commun ; 14(1): 2535, 2023 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-37137891

RESUMEN

This research proposes a deep-learning paradigm, termed functional learning (FL), to physically train a loose neuron array, a group of non-handcrafted, non-differentiable, and loosely connected physical neurons whose connections and gradients are beyond explicit expression. The paradigm targets training non-differentiable hardware, and therefore solves many interdisciplinary challenges at once: the precise modeling and control of high-dimensional systems, the on-site calibration of multimodal hardware imperfectness, and the end-to-end training of non-differentiable and modeless physical neurons through implicit gradient propagation. It offers a methodology to build hardware without handcrafted design, strict fabrication, and precise assembling, thus forging paths for hardware design, chip manufacturing, physical neuron training, and system control. In addition, the functional learning paradigm is numerically and physically verified with an original light field neural network (LFNN). It realizes a programmable incoherent optical neural network, a well-known challenge that delivers light-speed, high-bandwidth, and power-efficient neural network inference via processing parallel visible light signals in the free space. As a promising supplement to existing power- and bandwidth-constrained digital neural networks, light field neural network has various potential applications: brain-inspired optical computation, high-bandwidth power-efficient neural network inference, and light-speed programmable lens/displays/detectors that operate in visible light.

9.
Artículo en Inglés | MEDLINE | ID: mdl-37027614

RESUMEN

The combination of augmented reality (AR) and medicine is an important trend in current research. The powerful display and interaction capabilities of the AR system can assist doctors to perform more complex operations. Since the tooth itself is an exposed rigid body structure, dental AR is a relatively hot research direction with application potential. However, none of the existing dental AR solutions are designed for wearable AR devices such as AR glasses. At the same time, these methods rely on high-precision scanning equipment or auxiliary positioning markers, which greatly increases the operational complexity and cost of clinical AR. In this work, we propose a simple and accurate neural-implicit model-driven dental AR system, named ImTooth, and adapted for AR glasses. Based on the modeling capabilities and differentiable optimization properties of state-of-the-art neural implicit representations, our system fuses reconstruction and registration in a single network, greatly simplifying the existing dental AR solutions and enabling reconstruction, registration, and interaction. Specifically, our method learns a scale-preserving voxel-based neural implicit model from multi-view images captured from a textureless plaster model of the tooth. Apart from color and surface, we also learn the consistent edge feature inside our representation. By leveraging the depth and edge information, our system can register the model to real images without additional training. In practice, our system uses a single Microsoft HoloLens 2 as the only sensor and display device. Experiments show that our method can reconstruct high-precision models and accomplish accurate registration. It is also robust to weak, repeating and inconsistent textures. We also show that our system can be easily integrated into dental diagnostic and therapeutic procedures, such as bracket placement guidance.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9895-9907, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37027766

RESUMEN

This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views. Some recent works have shown that learning implicit neural representations of 3D scenes achieves remarkable view synthesis quality given dense input views. However, the representation learning will be ill-posed if the views are highly sparse. To solve this ill-posed problem, our key idea is to integrate observations over video frames. To this end, we propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh, so that the observations across frames can be naturally integrated. The deformable mesh also provides geometric guidance for the network to learn 3D representations more efficiently. Furthermore, we combine Neural Body with implicit surface models to improve the learned geometry. To evaluate our approach, we perform experiments on both synthetic and real-world data, which show that our approach outperforms prior works by a large margin on novel view synthesis and 3D reconstruction. We also demonstrate the capability of our approach to reconstruct a moving person from a monocular video on the People-Snapshot dataset.


Asunto(s)
Algoritmos , Cuerpo Humano , Humanos , Aprendizaje
11.
IEEE Trans Vis Comput Graph ; 29(4): 1964-1976, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34919519

RESUMEN

Controlling the size and shear of elements is crucial in pure hex or hex-dominant meshing. To this end, non-orthonormal frame fields that are almost everywhere integrable (except for the singularities) can play a key role. However, it is often challenging or impossible to generate such a frame field under the tight control of a general Riemannian metric field. Therefore, we propose to solve a relatively weaker problem, i.e., generating such a frame field for a Riemannian metric field that is flat away from singularities. Such a metric field admits a local isometry to 3D Euclidean space. Applying Cartans first structural equation to the associated rotation field, i.e., the rotation part of the frame field, we show that the rotation field must have zero covariant derivatives under the 3D connection induced by the metric field. This observation leads to a metric-aware smoothness measure, equivalent to local integrability. The use of such a measure can be justified on meshes associated with locally flat metric fields. We also propose a method to generate smooth metric fields under a few intuitive constraints. On cuboid shapes, our method generates singularities aware of the metric fields, which makes the parameterization match the input metric fields better than the conventional methods. For generic shapes, while our method generates visually similar results to those using boundary frame fields to guide the metric field generation, the integrability and consistency of the metric fields are still improved, as reflected by the statistics.

12.
IEEE Trans Vis Comput Graph ; 29(6): 2950-2964, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35077364

RESUMEN

Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from grasping the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and interactively. In this article, we explore visualization design for demonstrating the semantics of code pieces in the context of data transformation. First, to depict individual data transformations, we structure a design space by two primary dimensions, i.e., key parameters to encode and possible visual channels to be mapped. Then, we derive a collection of 23 glyphs that visualize the semantics of transformations. Next, we design a pipeline, named Somnus, that provides an overview of the creation and evolution of data tables using a provenance graph. At the same time, it allows detailed investigation of individual transformations. User feedback on Somnus is positive. Our study participants achieved better accuracy with less time using Somnus, and preferred it over carefully-crafted textual description. Further, we provide two example applications to demonstrate the utility and versatility of Somnus.

13.
IEEE Trans Vis Comput Graph ; 29(10): 4284-4295, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35793302

RESUMEN

The level of detail (LOD) technique has been widely exploited as a key rendering optimization in many graphics applications. Numerous approaches have been proposed to automatically generate different kinds of LODs, such as geometric LOD or shader LOD. However, none of them have considered simplifying the geometry and shader at the same time. In this paper, we explore the observation that simplifications of geometric and shading details can be combined to provide a greater variety of tradeoffs between performance and quality. We present a new discrete multiresolution representation of objects, which consists of mesh and shader LODs. Each level of the representation could contain both simplified representations of shader and mesh. To create such LODs, we propose two automatic algorithms that pursue the best simplifications of meshes and shaders at adaptively selected distances. The results show that our mesh and shader LOD achieves better performance-quality tradeoffs than prior LOD representations, such as those that only consider simplified meshes or shaders.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7726-7738, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36409815

RESUMEN

We present a novel method for local image feature matching. Instead of performing image feature detection, description, and matching sequentially, we propose to first establish pixel-wise dense matches at a coarse level and later refine the good matches at a fine level. In contrast to dense methods that use a cost volume to search correspondences, we use self and cross attention layers in Transformer to obtain feature descriptors that are conditioned on both images. The global receptive field provided by Transformer enables our method to produce dense matches in low-texture areas, where feature detectors usually struggle to produce repeatable interest points. The experiments on indoor and outdoor datasets show that LoFTR outperforms state-of-the-art methods by a large margin. We further adapt LoFTR to modern SfM systems and illustrate its application in multiple-view geometry. The proposed method demonstrates superior performance in Image Matching Challenge 2021 and ranks first on two public benchmarks of visual localization among the published methods. The code is available at https://zju3dv.github.io/loftr.

15.
Artículo en Inglés | MEDLINE | ID: mdl-36459607

RESUMEN

Virtual content creation and interaction play an important role in modern 3D applications. Recovering detailed 3D models from real scenes can significantly expand the scope of its applications and has been studied for decades in the computer vision and computer graphics community. In this work, we propose Vox-Surf, a voxel-based implicit surface representation. Our Vox-Surf divides the space into finite sparse voxels, where each voxel is a basic geometry unit that stores geometry and appearance information on its corner vertices. Due to the sparsity inherited from the voxel representation, Vox-Surf is suitable for almost any scene and can be easily trained end-to-end from multiple view images. We utilize a progressive training process to gradually cull out empty voxels and keep only valid voxels for further optimization, which greatly reduces the number of sample points and improves inference speed. Experiments show that our Vox-Surf representation can learn fine surface details and accurate colors with less memory and faster rendering than previous methods. The resulting fine voxels can also be considered as the bounding volumes for collision detection, which is useful in 3D interactions. We also show the potential application of Vox-Surf in scene editing and augmented reality. The source code is publicly available at https://github.com/zju3dv/Vox-Surf.

16.
IEEE Trans Vis Comput Graph ; 28(11): 3727-3736, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36048987

RESUMEN

Bundle adjustment (BA) is widely used in SLAM and SfM, which are key technologies in Augmented Reality. For real-time SLAM and large-scale SfM, the efficiency of BA is of great importance. This paper proposes CoLi-BA, a novel and efficient BA solver that significantly improves the optimization speed by compact linearization and reordering. Specifically, for each reprojection function, the redundant matrix representation of Jacobian is replaced with a tiny 3D vector, by which the computational complexity, memory storage, and cache missing for Hessian matrix construction and Schur complement are significantly reduced. Besides, we also propose a novel reordering strategy to improve the cache efficiency for Schur complement. Experiments on diverse datasets show that the speed of the proposed CoLi-BA is five times that of Ceres and two times that of g2o without sacrificing accuracy. We further verify the effectiveness by porting CoLi-BA to the open-source SLAM and SfM systems. Even when running the proposed solver in a single thread, the local BA of SLAM only takes about 20ms on a desktop PC, and the reconstruction of SfM with seven thousand photos only takes half an hour. The source code is available on the webpage: https://github.com/zju3dv/CoLi-BA.

17.
IEEE Trans Vis Comput Graph ; 28(5): 2212-2222, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-35167466

RESUMEN

In this paper, we present a novel monocular visual-inertial odometry system with pre-built maps deployed on the remote server, which can robustly run in real-time on a mobile device even in high latency situations. By tightly coupling VIO with geometric priors from pre-built maps, our system can tolerate the high latency and low frequency of global localization service, which is especially suitable for practical applications when the localization service is deployed on the remote server. Firstly, sparse point clouds are obtained from the dense mesh by the ray casting method according to the localization results. The dense mesh can be reconstructed from the point clouds generated by Structure-from-Motion. We directly use the sparse point clouds in feature tracking and state update to suppress drift. In the process of feature tracking, the high local accuracy of VIO is fully utilized to effectively remove outliers and make our system robust. The experiments on EurocMav datasets and simulation datasets show that compared with state-of-the-art methods, our method can achieve better results in terms of both precision and robustness. The effectiveness of the proposed method is further demonstrated through a real-time AR demo on a mobile phone with the aid of visual localization on the remote server.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 3212-3223, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-33360984

RESUMEN

This paper addresses the problem of instance-level 6DoF object pose estimation from a single RGB image. Many recent works have shown that a two-stage approach, which first detects keypoints and then solves a Perspective-n-Point (PnP) problem for pose estimation, achieves remarkable performance. However, most of these methods only localize a set of sparse keypoints by regressing their image coordinates or heatmaps, which are sensitive to occlusion and truncation. Instead, we introduce a Pixel-wise Voting Network (PVNet) to regress pixel-wise vectors pointing to the keypoints and use these vectors to vote for keypoint locations. This creates a flexible representation for localizing occluded or truncated keypoints. Another important feature of this representation is that it provides uncertainties of keypoint locations that can be further leveraged by the PnP solver. Experiments show that the proposed approach outperforms the state of the art on the LINEMOD, Occluded LINEMOD, YCB-Video, and Tless datasets, while being efficient for real-time pose estimation. We further create a Truncated LINEMOD dataset to validate the robustness of our approach against truncation. The code is available at https://github.com/zju3dv/pvnet.


Asunto(s)
Algoritmos , Política
19.
IEEE Trans Vis Comput Graph ; 28(10): 3486-3498, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-33684038

RESUMEN

With the prevalence of embedded GPUs on mobile devices, power-efficient rendering has become a widespread concern for graphics applications. Reducing the power consumption of rendering applications is critical for extending battery life. In this paper, we present a new real-time power-budget rendering system to meet this need by selecting the optimal rendering settings that maximize visual quality for each frame under a given power budget. Our method utilizes two independent neural networks trained entirely by synthesized datasets to predict power consumption and image quality under various workloads. This approach spares time-consuming precomputation or runtime periodic refitting and additional error computation. We evaluate the performance of the proposed framework on different platforms, two desktop PCs and two smartphones. Results show that compared to the previous state of the art, our system has less overhead and better flexibility. Existing rendering engines can integrate our system with negligible costs.


Asunto(s)
Algoritmos , Gráficos por Computador , Redes Neurales de la Computación , Teléfono Inteligente
20.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5529-5540, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33914683

RESUMEN

In this paper, we propose a novel system named Disp R-CNN for 3D object detection from stereo images. Many recent works solve this problem by first recovering point clouds with disparity estimation and then apply a 3D detector. The disparity map is computed for the entire image, which is costly and fails to leverage category-specific prior. In contrast, we design an instance disparity estimation network (iDispNet) that predicts disparity only for pixels on objects of interest and learns a category-specific shape prior for more accurate disparity estimation. To address the challenge from scarcity of disparity annotation in training, we propose to use a statistical shape model to generate dense disparity pseudo-ground-truth without the need of LiDAR point clouds, which makes our system more widely applicable. Experiments on the KITTI dataset show that, when LiDAR ground-truth is not used at training time, Disp R-CNN outperforms previous state-of-the-art methods based on stereo input by 20 percent in terms of average precision for all categories. The code and pseudo-ground-truth data are available at the project page: https://github.com/zju3dv/disprcnn.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...