Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 18432, 2024 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-39117709

RESUMO

Timely and effective diagnosis of fungal keratitis (FK) is necessary for suitable treatment and avoiding irreversible vision loss for patients. In vivo confocal microscopy (IVCM) has been widely adopted to guide the FK diagnosis. We present a deep learning framework for diagnosing fungal keratitis using IVCM images to assist ophthalmologists. Inspired by the real diagnostic process, our method employs a two-stage deep architecture for diagnostic predictions based on both image-level and sequence-level information. To the best of our knowledge, we collected the largest dataset with 96,632 IVCM images in total with expert labeling to train and evaluate our method. The specificity and sensitivity of our method in diagnosing FK on the unseen test set achieved 96.65% and 97.57%, comparable or better than experienced ophthalmologists. The network can provide image-level, sequence-level and patient-level diagnostic suggestions to physicians. The results show great promise for assisting ophthalmologists in FK diagnosis.


Assuntos
Ceratite , Microscopia Confocal , Microscopia Confocal/métodos , Ceratite/microbiologia , Ceratite/diagnóstico , Ceratite/diagnóstico por imagem , Humanos , Aprendizado Profundo , Infecções Oculares Fúngicas/diagnóstico , Infecções Oculares Fúngicas/microbiologia , Infecções Oculares Fúngicas/diagnóstico por imagem , Infecções Oculares Fúngicas/patologia , Redes Neurais de Computação , Sensibilidade e Especificidade
2.
Artigo em Inglês | MEDLINE | ID: mdl-38963737

RESUMO

Motion retargeting is an active research area in computer graphics and animation, allowing for the transfer of motion from one character to another, thereby creating diverse animated character data. While this technology has numerous applications in animation, games, and movies, current methods often produce unnatural or semantically inconsistent motion when applied to characters with different shapes or joint counts. This is primarily due to a lack of consideration for the geometric and spatial relationships between the body parts of the source and target characters. To tackle this challenge, we introduce a novel spatially-preserving Skinned Motion Retargeting Network (SMRNet) capable of handling motion retargeting for characters with varying shapes and skeletal structures while maintaining semantic consistency. By learning a hybrid representation of the character's skeleton and shape in a rest pose, SMRNet transfers the rotation and root joint position of the source character's motion to the target character through embedded rest pose feature alignment. Additionally, it incorporates a differentiable loss function to further preserve the spatial consistency of body parts between the source and target. Comprehensive quantitative and qualitative evaluations demonstrate the superiority of our approach over existing alternatives, particularly in preserving spatial relationships more effectively.

3.
IEEE Trans Vis Comput Graph ; 30(5): 2693-2702, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38437103

RESUMO

Redirected walking (RDW) facilitates user navigation within expansive virtual spaces despite the constraints of limited physical spaces. It employs discrepancies between human visual-proprioceptive sensations, known as gains, to enable the remapping of virtual and physical environments. In this paper, we explore how to apply rotation gain while the user is walking. We propose to apply a rotation gain to let the user rotate by a different angle when reciprocating from a previous head rotation, to achieve the aim of steering the user to a desired direction. To apply the gains imperceptibly based on such a Bidirectional Rotation gain Difference (BiRD), we conduct both measurement and verification experiments on the detection thresholds of the rotation gain for reciprocating head rotations during walking. Unlike previous rotation gains which are measured when users are turning around in place (standing or sitting), BiRD is measured during users' walking. Our study offers a critical assessment of the acceptable range of rotational mapping differences for different rotational orientations across the user's walking experience, contributing to an effective tool for redirecting users in virtual environments.


Assuntos
Gráficos por Computador , Caminhada , Humanos , Animais , Orientação , Meio Ambiente , Aves
4.
Artigo em Inglês | MEDLINE | ID: mdl-38386584

RESUMO

There has been a high demand for facial makeup transfer tools in fashion e-commerce and virtual avatar generation. Most of the existing makeup transfer methods are based on the generative adversarial networks. Despite their success in makeup transfer for a single image, they struggle to maintain the consistency of makeup under different poses and expressions of the same person. In this paper, we propose a robust makeup transfer method which consistently transfers the makeup style of a reference image to facial images in any poses and expressions. Our method introduces the implicit 3D representation, neural radiance fields (NeRFs), to ensure the geometric and appearance consistency. It has two separate stages, including one basic NeRF module to reconstruct the geometry from the input facial image sequence, and a makeup module to learn how to transfer the reference makeup style consistently. We propose a novel hybrid makeup loss which is specially designed based on the makeup characteristics to supervise the training of the makeup module. The proposed loss significantly improves the visual quality and faithfulness of the makeup transfer effects. To better align the distribution between the transferred makeup and the reference makeup, a patch-based discriminator that works in the pose-independent UV texture space is proposed to provide more accurate control of the synthesized makeup. Extensive experiments and a user study demonstrate the superiority of our network for a variety of different makeup styles.

5.
IEEE Trans Vis Comput Graph ; 30(7): 4416-4428, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38358860

RESUMO

Capturing an omnidirectional image with a 360-degree field of view entails capturing intricate spatial and lighting details of the scene. Consequently, existing intrinsic image decomposition methods face significant challenges when attempting to separate reflectance and shading components from a low dynamic range (LDR) omnidirectional images. To address this, our article introduces a novel method specifically designed for the intrinsic decomposition of omnidirectional images. Leveraging the unique characteristics of the 360-degree scene representation, we employ a pre-extraction technique to isolate specific illumination information. Subsequently, we establish new constraints based on these extracted details and the inherent characteristics of omnidirectional images. These constraints limit the illumination intensity range and incorporate spherical-based illumination variation. By formulating and solving an objective function that accounts for these constraints, our method achieves a more accurate separation of reflectance and shading components. Comprehensive qualitative and quantitative evaluations demonstrate the superiority of our proposed method over state-of-the-art intrinsic decomposition methods.

6.
IEEE Trans Vis Comput Graph ; 30(4): 1916-1926, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37028008

RESUMO

With the recent rise of Metaverse, online multiplayer VR applications are becoming increasingly prevalent worldwide. However, as multiple users are located in different physical environments, different reset frequencies and timings can lead to serious fairness issues for online collaborative/competitive VR applications. For the fairness of online VR apps/games, an ideal online RDW strategy must make the locomotion opportunities of different users equal, regardless of different physical environment layouts. The existing RDW methods lack the scheme to coordinate multiple users in different PEs, and thus have the issue of triggering too many resets for all the users under the locomotion fairness constraint. We propose a novel multi-user RDW method that is able to significantly reduce the overall reset number and give users a better immersive experience by providing a fair exploration. Our key idea is to first find out the "bottleneck" user that may cause all users to be reset and estimate the time to reset given the users' next targets, and then redirect all the users to favorable poses during that maximized bottleneck time to ensure the subsequent resets can be postponed as much as possible. More particularly, we develop methods to estimate the time of possibly encountering obstacles and the reachable area for a specific pose to enable the prediction of the next reset caused by any user. Our experiments and user study found that our method outperforms existing RDW methods in online VR applications.

7.
Artigo em Inglês | MEDLINE | ID: mdl-37889815

RESUMO

360° images and videos have become an economic and popular way to provide VR experiences using real-world content. However, the manipulation of the stereo panoramic content remains less explored. In this paper, we focus on the 360° image composition problem, and develop a solution that can take an object from a stereo image pair and insert it at a given 3D position in a target stereo panorama, with well-preserved geometry information. Our method uses recovered 3D point clouds to guide the composited image generation. More specifically, we observe that using only a one-off operation to insert objects into equirectangular images will never produce satisfactory depth perception and generate ghost artifacts when users are watching the result from different view directions. Therefore, we propose a novel per-view projection method that segments the object in 3D spherical space with the stereo camera pair facing in that direction. A deep depth densification network is proposed to generate depth guidance for the stereo image generation of each view segment according to the desired position and pose of the inserted object. We finally combine the synthesized view segments and blend the objects into the target stereo 360° scene. A user study demonstrates that our method can provide good depth perception and removes ghost artifacts. The per-view solution is a potential paradigm for other content manipulation methods for 360° images and videos.

8.
IEEE Trans Vis Comput Graph ; 29(9): 3976-3988, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35605000

RESUMO

Six degrees-of-freedom (6-DoF) video provides telepresence by enabling users to move around in the captured scene with a wide field of regard. Compared to methods requiring sophisticated camera setups, the image-based rendering method based on photogrammetry can work with images captured with any poses, which is more suitable for casual users. However, existing image-based rendering methods are based on perspective images. When used to reconstruct 6-DoF views, it often requires capturing hundreds of images, making data capture a tedious and time-consuming process. In contrast to traditional perspective images, 360° images capture the entire surrounding view in a single shot, thus, providing a faster capturing process for 6-DoF view reconstruction. This article presents a novel method to provide 6-DoF experiences over a wide area using an unstructured collection of 360° panoramas captured by a conventional 360° camera. Our method consists of 360° data capturing, novel depth estimation to produce a high-quality spherical depth panorama, and high-fidelity free-viewpoint generation. We compared our method against state-of-the-art methods, using data captured in various environments. Our method shows better visual quality and robustness in the tested scenes.

9.
IEEE Trans Vis Comput Graph ; 29(6): 2965-2979, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35077365

RESUMO

Coloring line art images based on the colors of reference images is a crucial stage in animation production, which is time-consuming and tedious. This paper proposes a deep architecture to automatically color line art videos with the same color style as the given reference images. Our framework consists of a color transform network and a temporal refinement network based on 3U-net. The color transform network takes the target line art images as well as the line art and color images of the reference images as input and generates corresponding target color images. To cope with the large differences between each target line art image and the reference color images, we propose a distance attention layer that utilizes non-local similarity matching to determine the region correspondences between the target image and the reference images and transforms the local color information from the references to the target. To ensure global color style consistency, we further incorporate Adaptive Instance Normalization (AdaIN) with the transformation parameters obtained from a multiple-layer AdaIN that describes the global color style of the references extracted by an embedder network. The temporal refinement network learns spatiotemporal features through 3D convolutions to ensure the temporal color consistency of the results. Our model can achieve even better coloring results by fine-tuning the parameters with only a small number of samples when dealing with an animation of a new style. To evaluate our method, we build a line art coloring dataset. Experiments show that our method achieves the best performance on line art video coloring compared to the current state-of-the-art methods.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2009-2023, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35471870

RESUMO

Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing graph convolutional operations to encode structural properties of a skeletal graph, whose topology is manually predefined and fixed over all action samples. Some recent works further take sample-dependent relationships among joints into consideration. However, the complex relationships between arbitrary pairwise joints are difficult to learn and the temporal features between frames are not fully exploited by simply using traditional convolutions with small local kernels. In this paper, we propose a motif-based graph convolution method, which makes use of sample-dependent latent relations among non-physically connected joints to impose a high-order locality and assigns different semantic roles to physical neighbors of a joint to encode hierarchical structures. Furthermore, we propose a sparsity-promoting loss function to learn a sparse motif adjacency matrix for latent dependencies in non-physical connections. For extracting effective temporal information, we propose an efficient local temporal block. It adopts partial dense connections to reuse temporal features in local time windows, and enrich a variety of information flow by gradient combination. In addition, we introduce a non-local temporal block to capture global dependencies among frames. Our model can capture local and non-local relationships both spatially and temporally, by integrating the local and non-local temporal blocks into the sparse motif-based graph convolutional networks (SMotif-GCNs). Comprehensive experiments on four large-scale datasets show that our model outperforms the state-of-the-art methods. Our code is publicly available at https://github.com/wenyh1616/SAMotif-GCN.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA