Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
IEEE Trans Vis Comput Graph ; 30(5): 2767-2775, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38564356

RESUMEN

High-precision virtual environments are increasingly important for various education, simulation, training, performance, and entertainment applications. We present HoloCamera, an innovative volumetric capture instrument to rapidly acquire, process, and create cinematic-quality virtual avatars and scenarios. The HoloCamera consists of a custom-designed free-standing structure with 300 high-resolution RGB cameras mounted with uniform spacing spanning the four sides and the ceiling of a room-sized studio. The light field acquired from these cameras is streamed through a distributed array of GPUs that interleave the processing and transmission of 4K resolution images. The distributed compute infrastructure that powers these RGB cameras consists of 50 Jetson AGX Xavier boards, with each processing unit dedicated to driving and processing imagery from six cameras. A high-speed Gigabit Ethernet network fabric seamlessly interconnects all computing boards. In this systems paper, we provide an in-depth description of the steps involved and lessons learned in constructing such a cutting-edge volumetric capture facility that can be generalized to other such facilities. We delve into the techniques employed to achieve precise frame synchronization and spatial calibration of cameras, careful determination of angled camera mounts, image processing from the camera sensors, and the need for a resilient and robust network infrastructure. To advance the field of volumetric capture, we are releasing a high-fidelity static light-field dataset, which will serve as a benchmark for further research and applications of cinematic-quality volumetric light fields.

2.
Artículo en Inglés | MEDLINE | ID: mdl-37672376

RESUMEN

Learning probabilistic models that can estimate the density of a given set of samples, and generate samples from that density, is one of the fundamental challenges in unsupervised machine learning. We introduce a new generative model based on denoising density estimators (DDEs), which are scalar functions parametrized by neural networks, that are efficiently trained to represent kernel density estimators of the data. Leveraging DDEs, our main contribution is a novel technique to obtain generative models by minimizing the Kullback-Leibler (KL)-divergence directly. We prove that our algorithm for obtaining generative models is guaranteed to converge consistently to the correct solution. Our approach does not require specific network architecture as in normalizing flows (NFs), nor use ordinary differential equation (ODE) solvers as in continuous NFs. Experimental results demonstrate substantial improvement in density estimation and competitive performance in generative model training.

3.
Artículo en Inglés | MEDLINE | ID: mdl-37478036

RESUMEN

Recent neural rendering methods have made great progress in generating photorealistic human avatars. However, these methods are generally conditioned only on low-dimensional driving signals (e.g., body poses), which are insufficient to encode the complete appearance of a clothed human. Hence they fail to generate faithful details. To address this problem, we exploit driving view images (e.g., in telepresence systems) as additional inputs. We propose a novel neural rendering pipeline, Hybrid Volumetric-Textural Rendering (HVTR++), which synthesizes 3D human avatars from arbitrary driving poses and views while staying faithful to appearance details efficiently and at high quality. First, we learn to encode the driving signals of pose and view image on a dense UV manifold of the human body surface and extract UV-aligned features, preserving the structure of a skeleton-based parametric model. To handle complicated motions (e.g., self-occlusions), we then leverage the UV-aligned features to construct a 3D volumetric representation based on a dynamic neural radiance field. While this allows us to represent 3D geometry with changing topology, volumetric rendering is computationally heavy. Hence we employ only a rough volumetric representation using a pose- and image-conditioned downsampled neural radiance field (PID-NeRF), which we can render efficiently at low resolutions. In addition, we learn 2D textural features that are fused with rendered volumetric features in image space. The key advantage of our approach is that we can then convert the fused features into a high-resolution, high-quality avatar by a fast GAN-based textural renderer. We demonstrate that hybrid rendering enables HVTR++ to handle complicated motions, render high-quality avatars under user-controlled poses/shapes, and most importantly, be efficient at inference time. Our experimental results also demonstrate state-of-the-art quantitative results.

4.
IEEE Trans Image Process ; 30: 1744-1758, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33417547

RESUMEN

Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods. The FG3D dataset is available at https://github.com/liuxinhai/FG3D-Net.

5.
IEEE Trans Vis Comput Graph ; 27(4): 2250-2264, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-31670674

RESUMEN

Small object arrangement is very important for creating detailed and realistic 3D indoor scenes. In this article, we present an interactive framework based on active learning to help users create customized arrangements for small objects according to their preferences. To achieve this with minimal user effort, we first learn the prior knowledge about small object arrangement from a 3D indoor scene dataset through a probability mining method, which forms the initial guidance for arranging small objects. Then, users are able to express their preferences on a few small object categories, which are automatically propagated to all the other categories via a novel active learning approach. In the propagation process, we introduce a novel metric to obtain the propagation weights, which measures the degree of interchangeability between two small object categories, and is calculated based on a spatial embedding model learned from the small object neighborhood information extracted from the 3D indoor scene dataset. Experiments show that our framework is able to help users effectively create customized small object arrangements with little effort.

6.
Artículo en Inglés | MEDLINE | ID: mdl-32870791

RESUMEN

3D shape reconstruction from multiple hand-drawn sketches is an intriguing way to 3D shape modeling. Currently, state-of-the-art methods employ neural networks to learn a mapping from multiple sketches from arbitrary view angles to a 3D voxel grid. Because of the cubic complexity of 3D voxel grids, however, neural networks are hard to train and limited to low resolution reconstructions, which leads to a lack of geometric detail and low accuracy. To resolve this issue, we propose to reconstruct 3D shapes from multiple sketches using direct shape optimization (DSO), which does not involve deep learning models for direct voxel-based 3D shape generation. Specifically, we first leverage a conditional generative adversarial network (CGAN) to translate each sketch into an attenuance image that captures the predicted geometry from a given viewpoint. Then, DSO minimizes a project-and-compare loss to reconstruct the 3D shape such that it matches the predicted attenuance images from the view angles of all input sketches. Based on this, we further propose a progressive update approach to handle inconsistencies among a few hand-drawn sketches for the same 3D shape. Our experimental results show that our method significantly outperforms the state-of-the-art methods under widely used benchmarks and produces intuitive results in an interactive application.

7.
IEEE Trans Image Process ; 28(8): 3986-3999, 2019 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-30872228

RESUMEN

Learning 3D global features by aggregating multiple views is important. Pooling is widely used to aggregate views in deep learning models. However, pooling disregards a lot of content information within views and the spatial relationship among the views, which limits the discriminability of learned features. To resolve this issue, 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation. Specifically, the content information within each view is first encoded. Then, the encoded view content information and the sequential spatiality among the views are simultaneously aggregated by the hierarchical attention aggregation, where view-level attention and class-level attention are proposed to hierarchically weight sequential views and shape classes. View-level attention is learned to indicate how much attention is paid to each view by each shape class, which subsequently weights sequential views through a novel recursive view integration. Recursive view integration learns the semantic meaning of view sequence, which is robust to the first view position. Furthermore, class-level attention is introduced to describe how much attention is paid to each shape class, which innovatively employs the discriminative ability of the fine-tuned network. 3D2SeqViews learns more discriminative features than the state-of-the-art, which leads to the outperforming results in shape classification and retrieval under three large-scale benchmarks.

8.
IEEE Trans Image Process ; 28(2): 658-672, 2019 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30183634

RESUMEN

Learning 3D global features by aggregating multiple views has been introduced as a successful strategy for 3D shape analysis. In recent deep learning models with end-to-end training, pooling is a widely adopted procedure for view aggregation. However, pooling merely retains the max or mean value over all views, which disregards the content information of almost all views and also the spatial information among the views. To resolve these issues, we propose Sequential Views To Sequential Labels (SeqViews2SeqLabels) as a novel deep learning model with an encoder-decoder structure based on recurrent neural networks (RNNs) with attention. SeqViews2SeqLabels consists of two connected parts, an encoder-RNN followed by a decoder-RNN, that aim to learn the global features by aggregating sequential views and then performing shape classification from the learned global features, respectively. Specifically, the encoder-RNN learns the global features by simultaneously encoding the spatial and content information of sequential views, which captures the semantics of the view sequence. With the proposed prediction of sequential labels, the decoder-RNN performs more accurate classification using the learned global features by predicting sequential labels step by step. Learning to predict sequential labels provides more and finer discriminative information among shape classes to learn, which alleviates the overfitting problem inherent in training using a limited number of 3D shapes. Moreover, we introduce an attention mechanism to further improve the discriminative ability of SeqViews2SeqLabels. This mechanism increases the weight of views that are distinctive to each shape class, and it dramatically reduces the effect of selecting the first view position. Shape classification and retrieval results under three large-scale benchmarks verify that SeqViews2SeqLabels learns more discriminative global features by more effectively aggregating sequential views than state-of-the-art methods.

9.
Science ; 360(6394): 1188, 2018 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-29903964
10.
IEEE Trans Pattern Anal Mach Intell ; 40(10): 2529-2537, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-28945589

RESUMEN

We present a structure-aware technique to consolidate noisy data, which we use as a pre-process for standard clustering and dimensionality reduction. Our technique is related to mean shift, but instead of seeking density modes, it reveals and consolidates continuous high density structures such as curves and surface sheets in the underlying data while ignoring noise and outliers. We provide a theoretical analysis under a Gaussian noise model, and show that our approach significantly improves the performance of many non-linear dimensionality reduction and clustering algorithms in challenging scenarios.

11.
IEEE Trans Vis Comput Graph ; 24(8): 2315-2326, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-28708561

RESUMEN

Point set filtering, which aims at reconstructing noise-free point sets from their corresponding noisy inputs, is a fundamental problem in 3D geometry processing. The main challenge of point set filtering is to preserve geometric features of the underlying geometry while at the same time removing the noise. State-of-the-art point set filtering methods still struggle with this issue: some are not designed to recover sharp features, and others cannot well preserve geometric features, especially fine-scale features. In this paper, we propose a novel approach for robust feature-preserving point set filtering, inspired by the Gaussian Mixture Model (GMM). Taking a noisy point set and its filtered normals as input, our method can robustly reconstruct a high-quality point set which is both noise-free and feature-preserving. Various experiments show that our approach can soundly outperform the selected state-of-the-art methods, in terms of both filtering quality and reconstruction accuracy.

12.
Bull Math Biol ; 79(4): 788-827, 2017 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28247120

RESUMEN

In this paper, we present computational techniques to investigate the effect of surface geometry on biological pattern formation. In particular, we study two-component, nonlinear reaction-diffusion (RD) systems on arbitrary surfaces. We build on standard techniques for linear and nonlinear analysis of RD systems and extend them to operate on large-scale meshes for arbitrary surfaces. In particular, we use spectral techniques for a linear stability analysis to characterise and directly compose patterns emerging from homogeneities. We develop an implementation using surface finite element methods and a numerical eigenanalysis of the Laplace-Beltrami operator on surface meshes. In addition, we describe a technique to explore solutions of the nonlinear RD equations using numerical continuation. Here, we present a multiresolution approach that allows us to trace solution branches of the nonlinear equations efficiently even for large-scale meshes. Finally, we demonstrate the working of our framework for two RD systems with applications in biological pattern formation: a Brusselator model that has been used to model pattern development on growing plant tips, and a chemotactic model for the formation of skin pigmentation patterns. While these models have been used previously on simple geometries, our framework allows us to study the impact of arbitrary geometries on emerging patterns.


Asunto(s)
Quimiotaxis , Modelos Teóricos , Difusión , Fenómenos Físicos , Plantas
13.
IEEE Trans Image Process ; 23(7): 3114-25, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24876125

RESUMEN

Image denoising continues to be an active research topic. Although state-of-the-art denoising methods are numerically impressive and approch theoretical limits, they suffer from visible artifacts.While they produce acceptable results for natural images, human eyes are less forgiving when viewing synthetic images. At the same time, current methods are becoming more complex, making analysis, and implementation difficult. We propose image denoising as a simple physical process, which progressively reduces noise by deterministic annealing. The results of our implementation are numerically and visually excellent. We further demonstrate that our method is particularly suited for synthetic images. Finally, we offer a new perspective on image denoising using robust estimators.

14.
Science ; 339(6115): 78-81, 2013 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-23196908

RESUMEN

Various lineages of amniotes display keratinized skin appendages (feathers, hairs, and scales) that differentiate in the embryo from genetically controlled developmental units whose spatial organization is patterned by reaction-diffusion mechanisms (RDMs). We show that, contrary to skin appendages in other amniotes (as well as body scales in crocodiles), face and jaws scales of crocodiles are random polygonal domains of highly keratinized skin, rather than genetically controlled elements, and emerge from a physical self-organizing stochastic process distinct from RDMs: cracking of the developing skin in a stress field. We suggest that the rapid growth of the crocodile embryonic facial and jaw skeleton, combined with the development of a very keratinized skin, generates the mechanical stress that causes cracking.


Asunto(s)
Caimanes y Cocodrilos/anatomía & histología , Caimanes y Cocodrilos/crecimiento & desarrollo , Fenómenos Mecánicos , Piel/anatomía & histología , Piel/crecimiento & desarrollo , Caimanes y Cocodrilos/embriología , Animales , Cabeza , Maxilares/anatomía & histología , Maxilares/embriología , Queratinas , Piel/embriología
15.
IEEE Trans Vis Comput Graph ; 17(5): 642-54, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-20530817

RESUMEN

In this paper, we analyze the reproduction of light fields on multiview 3D displays. A three-way interaction between the input light field signal (which is often aliased), the joint spatioangular sampling grids of multiview 3D displays, and the interview light leakage in modern multiview 3D displays is characterized in the joint spatioangular frequency domain. Reconstruction of light fields by all physical 3D displays is prone to light leakage, which means that the reconstruction low-pass filter implemented by the display is too broad in the angular domain. As a result, 3D displays excessively attenuate angular frequencies. Our analysis shows that this reduces sharpness of the images shown in the 3D displays. In this paper, stereoscopic image recovery is recast as a problem of joint spatioangular signal reconstruction. The combination of the 3D display point spread function and human visual system provides the narrow-band low-pass filter which removes spectral replicas in the reconstructed light field on the multiview display. The nonideality of this filter is corrected with the proposed prefiltering. The proposed light field reconstruction method performs light field antialiasing as well as angular sharpening to compensate for the nonideal response of the 3D display. The union of cosets approach which has been used earlier by others is employed here to model the nonrectangular spatioangular sampling grids on a multiview display in a generic fashion. We confirm the effectiveness of our approach in simulation and in physical hardware, and demonstrate improvement over existing techniques.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...