Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Vis Comput Graph ; 30(5): 2693-2702, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38437103

RESUMO

Redirected walking (RDW) facilitates user navigation within expansive virtual spaces despite the constraints of limited physical spaces. It employs discrepancies between human visual-proprioceptive sensations, known as gains, to enable the remapping of virtual and physical environments. In this paper, we explore how to apply rotation gain while the user is walking. We propose to apply a rotation gain to let the user rotate by a different angle when reciprocating from a previous head rotation, to achieve the aim of steering the user to a desired direction. To apply the gains imperceptibly based on such a Bidirectional Rotation gain Difference (BiRD), we conduct both measurement and verification experiments on the detection thresholds of the rotation gain for reciprocating head rotations during walking. Unlike previous rotation gains which are measured when users are turning around in place (standing or sitting), BiRD is measured during users' walking. Our study offers a critical assessment of the acceptable range of rotational mapping differences for different rotational orientations across the user's walking experience, contributing to an effective tool for redirecting users in virtual environments.


Assuntos
Gráficos por Computador , Caminhada , Humanos , Animais , Orientação , Meio Ambiente , Aves
2.
Artigo em Inglês | MEDLINE | ID: mdl-38386584

RESUMO

There has been a high demand for facial makeup transfer tools in fashion e-commerce and virtual avatar generation. Most of the existing makeup transfer methods are based on the generative adversarial networks. Despite their success in makeup transfer for a single image, they struggle to maintain the consistency of makeup under different poses and expressions of the same person. In this paper, we propose a robust makeup transfer method which consistently transfers the makeup style of a reference image to facial images in any poses and expressions. Our method introduces the implicit 3D representation, neural radiance fields (NeRFs), to ensure the geometric and appearance consistency. It has two separate stages, including one basic NeRF module to reconstruct the geometry from the input facial image sequence, and a makeup module to learn how to transfer the reference makeup style consistently. We propose a novel hybrid makeup loss which is specially designed based on the makeup characteristics to supervise the training of the makeup module. The proposed loss significantly improves the visual quality and faithfulness of the makeup transfer effects. To better align the distribution between the transferred makeup and the reference makeup, a patch-based discriminator that works in the pose-independent UV texture space is proposed to provide more accurate control of the synthesized makeup. Extensive experiments and a user study demonstrate the superiority of our network for a variety of different makeup styles.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38358860

RESUMO

Capturing an omnidirectional image with a 360-degree field of view entails capturing intricate spatial and lighting details of the scene. Consequently, existing intrinsic image decomposition methods face significant challenges when attempting to separate reflectance and shading components from a low dynamic range (LDR) omnidirectional images. To address this, our paper introduces a novel method specifically designed for the intrinsic decomposition of omnidirectional images. Leveraging the unique characteristics of the 360-degree scene representation, we employ a pre-extraction technique to isolate specific illumination information. Subsequently, we establish new constraints based on these extracted details and the inherent characteristics of omnidirectional images. These constraints limit the illumination intensity range and incorporate spherical-based illumination variation. By formulating and solving an objective function that accounts for these constraints, our method achieves a more accurate separation of reflectance and shading components. Comprehensive qualitative and quantitative evaluations demonstrate the superiority of our proposed method over state-of-the-art intrinsic decomposition methods.

4.
IEEE Trans Vis Comput Graph ; 30(4): 1916-1926, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37028008

RESUMO

With the recent rise of Metaverse, online multiplayer VR applications are becoming increasingly prevalent worldwide. However, as multiple users are located in different physical environments, different reset frequencies and timings can lead to serious fairness issues for online collaborative/competitive VR applications. For the fairness of online VR apps/games, an ideal online RDW strategy must make the locomotion opportunities of different users equal, regardless of different physical environment layouts. The existing RDW methods lack the scheme to coordinate multiple users in different PEs, and thus have the issue of triggering too many resets for all the users under the locomotion fairness constraint. We propose a novel multi-user RDW method that is able to significantly reduce the overall reset number and give users a better immersive experience by providing a fair exploration. Our key idea is to first find out the "bottleneck" user that may cause all users to be reset and estimate the time to reset given the users' next targets, and then redirect all the users to favorable poses during that maximized bottleneck time to ensure the subsequent resets can be postponed as much as possible. More particularly, we develop methods to estimate the time of possibly encountering obstacles and the reachable area for a specific pose to enable the prediction of the next reset caused by any user. Our experiments and user study found that our method outperforms existing RDW methods in online VR applications.

5.
Artigo em Inglês | MEDLINE | ID: mdl-37889815

RESUMO

360° images and videos have become an economic and popular way to provide VR experiences using real-world content. However, the manipulation of the stereo panoramic content remains less explored. In this paper, we focus on the 360° image composition problem, and develop a solution that can take an object from a stereo image pair and insert it at a given 3D position in a target stereo panorama, with well-preserved geometry information. Our method uses recovered 3D point clouds to guide the composited image generation. More specifically, we observe that using only a one-off operation to insert objects into equirectangular images will never produce satisfactory depth perception and generate ghost artifacts when users are watching the result from different view directions. Therefore, we propose a novel per-view projection method that segments the object in 3D spherical space with the stereo camera pair facing in that direction. A deep depth densification network is proposed to generate depth guidance for the stereo image generation of each view segment according to the desired position and pose of the inserted object. We finally combine the synthesized view segments and blend the objects into the target stereo 360° scene. A user study demonstrates that our method can provide good depth perception and removes ghost artifacts. The per-view solution is a potential paradigm for other content manipulation methods for 360° images and videos.

6.
IEEE Trans Vis Comput Graph ; 29(6): 2965-2979, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35077365

RESUMO

Coloring line art images based on the colors of reference images is a crucial stage in animation production, which is time-consuming and tedious. This paper proposes a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal refinement network based on 3U-net. The color transform network takes the target line art images as well as the line art and color images of the reference images as input and generates corresponding target color images. To cope with the large differences between each target line art image and the reference color images, we propose a distance attention layer that utilizes non-local similarity matching to determine the region correspondences between the target image and the reference images and transforms the local color information from the references to the target. To ensure global color style consistency, we further incorporate Adaptive Instance Normalization (AdaIN) with the transformation parameters obtained from a multiple-layer AdaIN that describes the global color style of the references extracted by an embedder network. The temporal refinement network learns spatiotemporal features through 3D convolutions to ensure the temporal color consistency of the results. Our model can achieve even better coloring results by fine-tuning the parameters with only a small number of samples when dealing with an animation of a new style. To evaluate our method, we build a line art coloring dataset. Experiments show that our method achieves the best performance on line art video coloring compared to the current state-of-the-art methods.

7.
IEEE Trans Vis Comput Graph ; 29(9): 3976-3988, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35605000

RESUMO

Six degrees-of-freedom (6-DoF) video provides telepresence by enabling users to move around in the captured scene with a wide field of regard. Compared to methods requiring sophisticated camera setups, the image-based rendering method based on photogrammetry can work with images captured with any poses, which is more suitable for casual users. However, existing image-based rendering methods are based on perspective images. When used to reconstruct 6-DoF views, it often requires capturing hundreds of images, making data capture a tedious and time-consuming process. In contrast to traditional perspective images, 360° images capture the entire surrounding view in a single shot, thus, providing a faster capturing process for 6-DoF view reconstruction. This article presents a novel method to provide 6-DoF experiences over a wide area using an unstructured collection of 360° panoramas captured by a conventional 360° camera. Our method consists of 360° data capturing, novel depth estimation to produce a high-quality spherical depth panorama, and high-fidelity free-viewpoint generation. We compared our method against state-of-the-art methods, using data captured in various environments. Our method shows better visual quality and robustness in the tested scenes.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2009-2023, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35471870

RESUMO

Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing graph convolutional operations to encode structural properties of a skeletal graph, whose topology is manually predefined and fixed over all action samples. Some recent works further take sample-dependent relationships among joints into consideration. However, the complex relationships between arbitrary pairwise joints are difficult to learn and the temporal features between frames are not fully exploited by simply using traditional convolutions with small local kernels. In this paper, we propose a motif-based graph convolution method, which makes use of sample-dependent latent relations among non-physically connected joints to impose a high-order locality and assigns different semantic roles to physical neighbors of a joint to encode hierarchical structures. Furthermore, we propose a sparsity-promoting loss function to learn a sparse motif adjacency matrix for latent dependencies in non-physical connections. For extracting effective temporal information, we propose an efficient local temporal block. It adopts partial dense connections to reuse temporal features in local time windows, and enrich a variety of information flow by gradient combination. In addition, we introduce a non-local temporal block to capture global dependencies among frames. Our model can capture local and non-local relationships both spatially and temporally, by integrating the local and non-local temporal blocks into the sparse motif-based graph convolutional networks (SMotif-GCNs). Comprehensive experiments on four large-scale datasets show that our model outperforms the state-of-the-art methods. Our code is publicly available at https://github.com/wenyh1616/SAMotif-GCN.

9.
IEEE Trans Image Process ; 31: 4336-4351, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35727783

RESUMO

Distinguishing between dynamic foreground objects and a mostly static background is a fundamental problem in many computer vision and computer graphics tasks. This paper presents a novel online video background identification method with the assistance of inertial measurement unit (IMU). Based on the fact that the background motion of a video essentially reflects the 3D camera motion, we leverage IMU data to realize a robust camera motion estimation for identifying background feature points by only investigating a few historical frames. We observe that the displacement of the 2D projection of a scene point caused by camera rotation is depth-invariant, and the rotation estimation by using IMU data can be quite accurate. We thus propose to analyze 2D feature points by decomposing the 2D motion into two components: rotation projection and translation projection. In our method, after establishing the 3D camera rotations, we generate the depth-relevant 2D feature point movement induced by the camera 3D translation. Then, by examining the disparity between inter-frame offset and the projection of estimated 3D camera motion, we can identify the background feature points. In the experiments, our online method is able to run at 30FPS with only 1 frame latency and outperforms state-of-the-art background identification and other relevant methods. Our method directly leads to a better camera motion estimation, which is beneficial to many applications like online video stabilization, SLAM, image stitching, etc.

10.
IEEE Trans Vis Comput Graph ; 28(2): 1198-1208, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32746275

RESUMO

In the animation industry, the colorization of raw sketch images is a vitally important but very time-consuming task. This article focuses on providing a novel solution that semiautomatically colorizes a set of images using a single colorized reference image. Our method is able to provide coherent colors for regions that have similar semantics to those in the reference image. An active-learning-based framework is used to match local regions, followed by mixed-integer quadratic programming (MIQP) which considers the spatial contexts to further refine the matching results. We efficiently utilize user interactions to achieve high accuracy in the final colorized images. Experiments show that our method outperforms the current state-of-the-art deep learning based colorization method in terms of color coherency with the reference image. The region matching framework could potentially be applied to other applications, such as color transfer.

11.
IEEE Trans Vis Comput Graph ; 27(7): 3198-3212, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31944958

RESUMO

This article proposes an approach to content-preserving image stitching with regular boundary constraints, which aims to stitch multiple images to generate a panoramic image with piecewise rectangular boundaries. Existing methods treat image stitching and rectangling as two separate steps, which may result in suboptimal results as the stitching process is not aware of the further warping needs for rectangling. We address these limitations by formulating image stitching with regular boundaries in a unified optimization framework. Starting from the initial stitching result produced by traditional warping-based optimization, we obtain the irregular boundary from the warped meshes by polygon Boolean operations which robustly handle arbitrary mesh compositions. By analyzing the irregular boundary, we construct a piecewise rectangular boundary. Based on this, we further incorporate line and regular boundary preservation constraints into the image stitching framework, and conduct iterative optimizations to obtain an optimal piecewise rectangular boundary. Thus we can make the boundary of the stitching result as close as possible to a rectangle, while reducing unwanted distortions. We further extend our method to video stitching, by integrating the temporal coherence into the optimization. Experiments show that our method efficiently produces visually pleasing panoramas with regular boundaries and unnoticeable distortions.

12.
Artigo em Inglês | MEDLINE | ID: mdl-31613764

RESUMO

General image completion and extrapolation methods often fail on portrait images where parts of the human body need to be recovered -a task that requires accurate human body structure and appearance synthesis. We present a twostage deep learning framework for tackling this problem. In the first stage, given a portrait image with an incomplete human body, we extract a complete, coherent human body structure through a human parsing network, which focuses on structure recovery inside the unknown region with the help of full-body pose estimation. In the second stage, we use an image completion network to fill the unknown region, guided by the structure map recovered in the first stage. For realistic synthesis the completion network is trained with both perceptual loss and conditional adversarial loss.We further propose a face refinement network to improve the fidelity of the synthesized face region. We evaluate our method on publicly-available portrait image datasets, and show that it outperforms other state-of-the-art general image completion methods. Our method enables new portrait image editing applications such as occlusion removal and portrait extrapolation. We further show that the proposed general learning framework can be applied to other types of images, e.g. animal images.

13.
IEEE Trans Vis Comput Graph ; 23(5): 1561-1573, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-26915124

RESUMO

Patch-based image synthesis methods have been successfully applied for various editing tasks on still images, videos and stereo pairs. In this work we extend patch-based synthesis to plenoptic images captured by consumer-level lenselet-based devices for interactive, efficient light field editing. In our method the light field is represented as a set of images captured from different viewpoints. We decompose the central view into different depth layers, and present it to the user for specifying the editing goals. Given an editing task, our method performs patch-based image synthesis on all affected layers of the central view, and then propagates the edits to all other views. Interaction is done through a conventional 2D image editing user interface that is familiar to novice users. Our method correctly handles object boundary occlusion with semi-transparency, thus can generate more realistic results than previous methods. We demonstrate compelling results on a wide range of applications such as hole-filling, object reshuffling and resizing, changing object depth, light field upscaling and parallax magnification.

14.
IEEE Trans Image Process ; 24(12): 5982-94, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26513791

RESUMO

A major difference between amateur and professional video lies in the quality of camera paths. Previous work on video stabilization has considered how to improve amateur video by smoothing the camera path. In this paper, we show that additional changes to the camera path can further improve video aesthetics. Our new optimization method achieves multiple simultaneous goals: 1) stabilizing video content over short time scales; 2) ensuring simple and consistent camera paths over longer time scales; and 3) improving scene composition by automatically removing distractions, a common occurrence in amateur video. Our approach uses an L(1) camera path optimization framework, extended to handle multiple constraints. Two passes of optimization are used to address both low-level and high-level constraints on the camera path. The experimental and user study results show that our approach outputs video that is perceptually better than the input, or the results of using stabilization only.

15.
IEEE Trans Vis Comput Graph ; 18(11): 1849-57, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22350199

RESUMO

We present a semiautomatic image editing framework dedicated to individual structured object replacement from groups. The major technical difficulty is element separation with irregular spatial distribution, hampering previous texture, and image synthesis methods from easily producing visually compelling results. Our method uses the object-level operations and finds grouped elements based on appearance similarity and curvilinear features. This framework enables a number of image editing applications, including natural image mixing, structure preserving appearance transfer, and texture mixing.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...