Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 156(1): 548-559, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39024384

RESUMO

Conventional near-field acoustic holography based on compressive sensing either does not fully exploit the underlying block-sparse structures of the signal or suffers from a mismatch between the actual and predefined block structure due to the lack of prior information about block partitions, resulting in poor accuracy in sound field reconstruction. In this paper, a pattern-coupled Bayesian compressive sensing method is proposed for sparse reconstruction of sound fields. The proposed method establishes a hierarchical Gaussian-Gamma probability model with a pattern-coupled prior based on the equivalent source method, transforming the sound field reconstruction problem into recovering the sparse coefficient vector of the equivalent source strengths within the compressive sensing framework. A set of hyperparameters is introduced to control the sparsity of each element in the sparse coefficient vector of the equivalent source strengths, where the sparsity of each element is determined by both its own hyperparameters and those of its immediate neighbors. This approach enables the promotion of block sparse solutions and achieves better performance in solving for the sparse coefficient vector of the equivalent source strengths without prior information of block partitions. The effectiveness and superiority of the proposed method in reconstructing sound fields are verified by simulations and experiments.

2.
Conscious Cogn ; 43: 152-66, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27310108

RESUMO

A fundamental question in vision research is whether visual recognition is determined by edge-based information (e.g., edge, line, and conjunction) or surface-based information (e.g., color, brightness, and texture). To investigate this question, we manipulated the stimulus onset asynchrony (SOA) between the scene and the mask in a backward masking task of natural scene categorization. The behavioral results showed that correct classification was higher for line-drawings than for color photographs when the SOA was 13ms, but lower when the SOA was longer. The ERP results revealed that most latencies of early components were shorter for the line-drawings than for the color photographs, and the latencies gradually increased with the SOA for the color photographs but not for the line-drawings. The results provide new evidence that edge-based information is the primary determinant of natural scene categorization, receiving priority processing; by contrast, surface information takes longer to facilitate natural scene categorization.


Assuntos
Potenciais Evocados Visuais/fisiologia , Estimulação Luminosa/métodos , Percepção Visual/fisiologia , Adulto , Feminino , Humanos , Masculino , Tempo de Reação/fisiologia , Adulto Jovem
3.
Artigo em Inglês | MEDLINE | ID: mdl-39042535

RESUMO

Generative Adversarial Networks have achieved significant advancements in generating and editing high-resolution images. However, most methods suffer from either requiring extensive labeled datasets or strong prior knowledge. It is also challenging for them to disentangle correlated attributes with few-shot data. In this paper, we propose FEditNet++, a GAN-based approach to explore latent semantics. It aims to enable attribute editing with limited labeled data and disentangle the correlated attributes. We propose a layer-wise feature contrastive objective, which takes into consideration content consistency and facilitates the invariance of the unrelated attributes before and after editing. Furthermore, we harness the knowledge from the pretrained discriminative model to prevent overfitting. In particular, to solve the entanglement problem between the correlated attributes from data and semantic latent correlation, we extend our model to jointly optimize multiple attributes and propose a novel decoupling loss and cross-assessment loss to disentangle them from both latent and image space. We further propose a novel-attribute disentanglement strategy to enable editing of novel attributes with unknown entanglements. Finally, we extend our model to accurately edit the fine-grained attributes. Qualitative and quantitative assessments demonstrate that our method outperforms state-of-the-art approaches across various datasets, including CelebA-HQ, RaFD, Danbooru2018 and LSUN Church.

4.
Artigo em Inglês | MEDLINE | ID: mdl-39078758

RESUMO

Applying diffusion models to image-to-image translation (I2I) has recently received increasing attention due to its practical applications. Previous attempts inject information from the source image into each denoising step for an iterative refinement, thus resulting in a time-consuming implementation. We propose an efficient method that equips a diffusion model with a lightweight translator, dubbed a Diffusion Model Translator (DMT), to accomplish I2I. Specifically, we first offer theoretical justification that in employing the pioneering DDPM work for the I2I task, it is both feasible and sufficient to transfer the distribution from one domain to another only at some intermediate step. We further observe that the translation performance highly depends on the chosen timestep for domain transfer, and therefore propose a practical strategy to automatically select an appropriate timestep for a given task. We evaluate our approach on a range of I2I applications, including image stylization, image colorization, segmentation to image, and sketch to image, to validate its efficacy and general utility. The comparisons show that our DMT surpasses existing methods in both quality and efficiency. Code will be made publicly available.

5.
IEEE Trans Vis Comput Graph ; 30(9): 6433-6446, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38145513

RESUMO

As a significant geometric feature of 3D point clouds, sharp features play an important role in shape analysis, 3D reconstruction, registration, localization, etc. Current sharp feature detection methods are still sensitive to the quality of the input point cloud, and the detection performance is affected by random noisy points and non-uniform densities. In this paper, using the prior knowledge of geometric features, we propose a Multi-scale Laplace Network (MSL-Net), a new deep-learning-based method based on an intrinsic neighbor shape descriptor, to detect sharp features from 3D point clouds. First, we establish a discrete intrinsic neighborhood of the point cloud based on the Laplacian graph, which reduces the error of local implicit surface estimation. Then, we design a new intrinsic shape descriptor based on the intrinsic neighborhood, combined with enhanced normal extraction and cosine-based field estimation function. Finally, we present the backbone of MSL-Net based on the intrinsic shape descriptor. Benefiting from the intrinsic neighborhood and shape descriptor, our MSL-Net has simple architecture and is capable of establishing accurate feature prediction that satisfies the manifold distribution while avoiding complex intrinsic metric calculations. Extensive experimental results demonstrate that with the multi-scale structure, MSL-Net has a strong analytical ability for local perturbations of point clouds. Compared with state-of-the-art methods, our MSL-Net is more robust and accurate.

6.
Artigo em Inglês | MEDLINE | ID: mdl-39150803

RESUMO

The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they struggle to capture complex surfaces with high-frequency structures due to the inadequate neural representation and the inaccurately predicted normal priors. This work aims to reconstruct high-fidelity surfaces with fine-grained details by addressing the above limitations. To improve the capacity of the implicit representation, we propose a hybrid architecture to represent low-frequency and high-frequency regions separately. To enhance the normal priors, we introduce a simple yet effective image sharpening and denoising technique, coupled with a network that estimates the pixel-wise uncertainty of the predicted surface normal vectors. Identifying such uncertainty can prevent our model from being misled by unreliable surface normal supervisions that hinder the accurate reconstruction of intricate geometries. Experiments on the benchmark datasets show that our method outperforms existing methods in terms of reconstruction quality. Furthermore, the proposed method also generalizes well to real-world indoor scenarios captured by our hand-held mobile phones. Our code is publicly available at: https://github.com/yec22/Fine-Grained-Indoor-Recon.

7.
IEEE Trans Vis Comput Graph ; 30(1): 606-616, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37871082

RESUMO

As communications are increasingly taking place virtually, the ability to present well online is becoming an indispensable skill. Online speakers are facing unique challenges in engaging with remote audiences. However, there has been a lack of evidence-based analytical systems for people to comprehensively evaluate online speeches and further discover possibilities for improvement. This paper introduces SpeechMirror, a visual analytics system facilitating reflection on a speech based on insights from a collection of online speeches. The system estimates the impact of different speech techniques on effectiveness and applies them to a speech to give users awareness of the performance of speech techniques. A similarity recommendation approach based on speech factors or script content supports guided exploration to expand knowledge of presentation evidence and accelerate the discovery of speech delivery possibilities. SpeechMirror provides intuitive visualizations and interactions for users to understand speech factors. Among them, SpeechTwin, a novel multimodal visual summary of speech, supports rapid understanding of critical speech factors and comparison of different speech samples, and SpeechPlayer augments the speech video by integrating visualization of the speaker's body language with interaction, for focused analysis. The system utilizes visualizations suited to the distinct nature of different speech factors for user comprehension. The proposed system and visualization techniques were evaluated with domain experts and amateurs, demonstrating usability for users with low visualization literacy and its efficacy in assisting users to develop insights for potential improvement.


Assuntos
Gráficos por Computador , Fala , Humanos , Comunicação
8.
Sci Data ; 11(1): 847, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39103399

RESUMO

Mixed emotions have attracted increasing interest recently, but existing datasets rarely focus on mixed emotion recognition from multimodal signals, hindering the affective computing of mixed emotions. On this basis, we present a multimodal dataset with four kinds of signals recorded while watching mixed and non-mixed emotion videos. To ensure effective emotion induction, we first implemented a rule-based video filtering step to select the videos that could elicit stronger positive, negative, and mixed emotions. Then, an experiment with 80 participants was conducted, in which the data of EEG, GSR, PPG, and frontal face videos were recorded while they watched the selected video clips. We also recorded the subjective emotional rating on PANAS, VAD, and amusement-disgust dimensions. In total, the dataset consists of multimodal signal data and self-assessment data from 73 participants. We also present technical validations for emotion induction and mixed emotion classification from physiological signals and face videos. The average accuracy of the 3-class classification (i.e., positive, negative, and mixed) can reach 80.96% when using SVM and features from all modalities, which indicates the possibility of identifying mixed emotional states.


Assuntos
Emoções , Humanos , Eletroencefalografia , Expressão Facial , Gravação em Vídeo
9.
Artigo em Inglês | MEDLINE | ID: mdl-38630565

RESUMO

Some robust point cloud registration approaches with controllable pose refinement magnitude, such as ICP and its variants, are commonly used to improve 6D pose estimation accuracy. However, the effectiveness of these methods gradually diminishes with the advancement of deep learning techniques and the enhancement of initial pose accuracy, primarily due to their lack of specific design for pose refinement. In this paper, we propose Point Cloud Completion and Keypoint Refinement with Fusion Data (PCKRF), a new pose refinement pipeline for 6D pose estimation. The pipeline consists of two steps. First, it completes the input point clouds via a novel pose-sensitive point completion network. The network uses both local and global features with pose information during point completion. Then, it registers the completed object point cloud with the corresponding target point cloud by our proposed Color supported Iterative KeyPoint (CIKP) method. The CIKP method introduces color information into registration and registers a point cloud around each keypoint to increase stability. The PCKRF pipeline can be integrated with existing popular 6D pose estimation methods, such as the full flow bidirectional fusion network, to further improve their pose estimation accuracy. Experiments demonstrate that our method exhibits superior stability compared to existing approaches when optimizing initial poses with relatively high precision. Notably, the results indicate that our method effectively complements most existing pose estimation techniques, leading to improved performance in most cases. Furthermore, our method achieves promising results even in challenging scenarios involving textureless and symmetrical objects. Our source code is available at https://github.com/zhanhz/KRF.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 905-918, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35104210

RESUMO

Face portrait line drawing is a unique style of art which is highly abstract and expressive. However, due to its high semantic constraints, many existing methods learn to generate portrait drawings using paired training data, which is costly and time-consuming to obtain. In this paper, we propose a novel method to automatically transform face photos to portrait drawings using unpaired training data with two new features; i.e., our method can (1) learn to generate high quality portrait drawings in multiple styles using a single network and (2) generate portrait drawings in a "new style" unseen in the training data. To achieve these benefits, we (1) propose a novel quality metric for portrait drawings which is learned from human perception, and (2) introduce a quality loss to guide the network toward generating better looking portrait drawings. We observe that existing unpaired translation methods such as CycleGAN tend to embed invisible reconstruction information indiscriminately in the whole drawings due to significant information imbalance between the photo and portrait drawing domains, which leads to important facial features missing. To address this problem, we propose a novel asymmetric cycle mapping that enforces the reconstruction information to be visible and only embedded in the selected facial regions. Along with localized discriminators for important facial regions, our method well preserves all important facial features in the generated drawings. Generator dissection further explains that our model learns to incorporate face semantic information during drawing generation. Extensive experiments including a user study show that our model outperforms state-of-the-art methods.

11.
Artigo em Inglês | MEDLINE | ID: mdl-37220037

RESUMO

3D dense captioning aims to semantically describe each object detected in a 3D scene, which plays a significant role in 3D scene understanding. Previous works lack a complete definition of 3D spatial relationships and the directly integrate visual and language modalities, thus ignoring the discrepancies between the two modalities. To address these issues, we propose a novel complete 3D relationship extraction modality alignment network, which consists of three steps: 3D object detection, complete 3D relationships extraction, and modality alignment caption. To comprehensively capture the 3D spatial relationship features, we define a complete set of 3D spatial relationships, including the local spatial relationship between objects and the global spatial relationship between each object and the entire scene. To this end, we propose a complete 3D relationships extraction module based on message passing and self-attention to mine multi-scale spatial relationship features and inspect the transformation to obtain features in different views. In addition, we propose the modality alignment caption module to fuse multi-scale relationship features and generate descriptions to bridge the semantic gap from the visual space to the language space with the prior information in the word embedding, and help generate improved descriptions for the 3D scene. Extensive experiments demonstrate that the proposed model outperforms the state-of-the-art methods on the ScanRefer and Nr3D datasets.

12.
IEEE Trans Vis Comput Graph ; 29(12): 5250-5264, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36103450

RESUMO

Simulating liquid-textile interaction has received great attention in computer graphics recently. Most existing methods take textiles as particles or parameterized meshes. Although these methods can generate visually pleasing results, they cannot simulate water content at a microscopic level due to the lack of geometrically modeling of textile's anisotropic structure. In this paper, we develop a method for yarn-level simulation of hygroscopicity of textiles and evaluate it using various quantitative metrics. We model textiles in a fiber-yarn-fabric multi-scale manner and consider the dynamic coupled physical mechanisms of liquid spreading, including wetting, wicking, moisture sorption/desorption, and transient moisture-heat transfer in textiles. Our method can accurately simulate liquid spreading on textiles with different fiber materials and geometrical structures with consideration of air temperatures and humidity conditions. It visualizes the hygroscopicity of textiles to demonstrate their moisture management ability. We conduct qualitative and quantitative experiments to validate our method and explore various factors to analyze their influence on liquid spreading and hygroscopicity of textiles.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 2009-2023, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35471870

RESUMO

Recent works have achieved remarkable performance for action recognition with human skeletal data by utilizing graph convolutional models. Existing models mainly focus on developing graph convolutional operations to encode structural properties of a skeletal graph, whose topology is manually predefined and fixed over all action samples. Some recent works further take sample-dependent relationships among joints into consideration. However, the complex relationships between arbitrary pairwise joints are difficult to learn and the temporal features between frames are not fully exploited by simply using traditional convolutions with small local kernels. In this paper, we propose a motif-based graph convolution method, which makes use of sample-dependent latent relations among non-physically connected joints to impose a high-order locality and assigns different semantic roles to physical neighbors of a joint to encode hierarchical structures. Furthermore, we propose a sparsity-promoting loss function to learn a sparse motif adjacency matrix for latent dependencies in non-physical connections. For extracting effective temporal information, we propose an efficient local temporal block. It adopts partial dense connections to reuse temporal features in local time windows, and enrich a variety of information flow by gradient combination. In addition, we introduce a non-local temporal block to capture global dependencies among frames. Our model can capture local and non-local relationships both spatially and temporally, by integrating the local and non-local temporal blocks into the sparse motif-based graph convolutional networks (SMotif-GCNs). Comprehensive experiments on four large-scale datasets show that our model outperforms the state-of-the-art methods. Our code is publicly available at https://github.com/wenyh1616/SAMotif-GCN.

14.
IEEE Trans Vis Comput Graph ; 29(12): 4964-4977, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35925853

RESUMO

Point cloud upsampling aims to generate dense point clouds from given sparse ones, which is a challenging task due to the irregular and unordered nature of point sets. To address this issue, we present a novel deep learning-based model, called PU-Flow, which incorporates normalizing flows and weight prediction techniques to produce dense points uniformly distributed on the underlying surface. Specifically, we exploit the invertible characteristics of normalizing flows to transform points between euclidean and latent spaces and formulate the upsampling process as ensemble of neighbouring points in a latent space, where the ensemble weights are adaptively learned from local geometric context. Extensive experiments show that our method is competitive and, in most test cases, it outperforms state-of-the-art methods in terms of reconstruction quality, proximity-to-surface accuracy, and computation efficiency. The source code will be publicly available at https://github.com/unknownue/puflow.

15.
IEEE Trans Vis Comput Graph ; 29(3): 1785-1798, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34851826

RESUMO

3D reconstruction from single-view images is a long-standing research problem. There have been various methods based on point clouds and volumetric representations. In spite of success in 3D models generation, it is quite challenging for these approaches to deal with models with complex topology and fine geometric details. Thanks to the recent advance of deep shape representations, learning the structure and detail representation using deep neural networks is a promising direction. In this article, we propose a novel approach named STD-Net to reconstruct 3D models utilizing mesh representation that is well suited for characterizing complex structures and geometry details. Our method consists of (1) an auto-encoder network for recovering the structure of an object with bounding box representation from a single-view image; (2) a topology-adaptive GCN for updating vertex position for meshes of complex topology; and (3) a unified mesh deformation block that deforms the structural boxes into structure-aware meshes. Evaluation on ShapeNet and PartNet shows that STD-Net has better performance than state-of-the-art methods in reconstructing complex structures and fine geometric details.

16.
Sci Rep ; 13(1): 2995, 2023 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-36810767

RESUMO

Positive human-agent relationships can effectively improve human experience and performance in human-machine systems or environments. The characteristics of agents that enhance this relationship have garnered attention in human-agent or human-robot interactions. In this study, based on the rule of the persona effect, we study the effect of an agent's social cues on human-agent relationships and human performance. We constructed a tedious task in an immersive virtual environment, designing virtual partners with varying levels of human likeness and responsiveness. Human likeness encompassed appearance, sound, and behavior, while responsiveness referred to the way agents responded to humans. Based on the constructed environment, we present two studies to explore the effects of an agent's human likeness and responsiveness to agents on participants' performance and perception of human-agent relationships during the task. The results indicate that when participants work with an agent, its responsiveness attracts attention and induces positive feelings. Agents with responsiveness and appropriate social response strategies have a significant positive effect on human-agent relationships. These results shed some light on how to design virtual agents to improve user experience and performance in human-agent interactions.


Assuntos
Atenção , Emoções , Humanos , Sistemas Homem-Máquina
17.
IEEE Trans Vis Comput Graph ; 29(4): 2203-2210, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-34752397

RESUMO

Caricature is a type of artistic style of human faces that attracts considerable attention in the entertainment industry. So far a few 3D caricature generation methods exist and all of them require some caricature information (e.g., a caricature sketch or 2D caricature) as input. This kind of input, however, is difficult to provide by non-professional users. In this paper, we propose an end-to-end deep neural network model that generates high-quality 3D caricatures directly from a normal 2D face photo. The most challenging issue for our system is that the source domain of face photos (characterized by normal 2D faces) is significantly different from the target domain of 3D caricatures (characterized by 3D exaggerated face shapes and textures). To address this challenge, we: (1) build a large dataset of 5,343 3D caricature meshes and use it to establish a PCA model in the 3D caricature shape space; (2) reconstruct a normal full 3D head from the input face photo and use its PCA representation in the 3D caricature shape space to establish correspondences between the input photo and 3D caricature shape; and (3) propose a novel character loss and a novel caricature loss based on previous psychological studies on caricatures. Experiments including a novel two-level user study show that our system can generate high-quality 3D caricatures directly from normal face photos.

18.
IEEE Trans Image Process ; 32: 3136-3149, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37227918

RESUMO

Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.g., appearance, size, pose, etc.) of video; we call such a task "fine-grained scene-level SBVR". The most challenging issue in this task is how to perform scene-level cross-modal alignment between sketch and video. Our solution consists of two parts. First, we construct a scene-level sketch-video dataset called SketchVideo, in which sketch-video pairs are provided and each pair contains a clip-level storyboard sketch and several keyframe sketches (corresponding to video frames). Second, we propose a novel deep learning architecture called Sketch Query Graph Convolutional Network (SQ-GCN). In SQ-GCN, we first adaptively sample the video frames to improve video encoding efficiency, and then construct appearance and category graphs to jointly model visual and semantic alignment between sketch and video. Experiments show that our fine-grained scene-level SBVR framework with SQ-GCN architecture outperforms the state-of-the-art fine-grained retrieval methods. The SketchVideo dataset and SQ-GCN code are available in the project webpage https://iscas-mmsketch.github.io/FG-SL-SBVR/.

19.
Artigo em Inglês | MEDLINE | ID: mdl-37021894

RESUMO

For 3D animators, choreography with artificial intelligence has attracted more attention recently. However, most existing deep learning methods mainly rely on music for dance generation and lack sufficient control over generated dance motions. To address this issue, we introduce the idea of keyframe interpolation for music-driven dance generation and present a novel transition generation technique for choreography. Specifically, this technique synthesizes visually diverse and plausible dance motions by using normalizing flows to learn the probability distribution of dance motions conditioned on a piece of music and a sparse set of key poses. Thus, the generated dance motions respect both the input musical beats and the key poses. To achieve a robust transition of varying lengths between the key poses, we introduce a time embedding at each timestep as an additional condition. Extensive experiments show that our model generates more realistic, diverse, and beat-matching dance motions than the compared state-of-the-art methods, both qualitatively and quantitatively. Our experimental results demonstrate the superiority of the keyframe-based control for improving the diversity of the generated dance motions.

20.
Zootaxa ; 5319(1): 76-90, 2023 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-37518249

RESUMO

A new species of the genus Hebius Thompson, 1913 is described from Youjiang District, Baise City, Guangxi Zhuang Autonomous Region, China, based on a single adult female specimen. It can be distinguished from its congeners by the following combination of characters: (1) dorsal scale rows 19-17-17, feebly keeled except the outermost row; (2) tail length comparatively long, TAL/TL ratio 0.30 in females; (3) ventrals 160 (+ 3 preventrals); (4) subcaudals 112; (5) supralabials 9, the fourth to sixth in contact with the eye; (6) infralabials 10, the first 5 touching the first pair of chin shields; (7) preocular 1; (8) postoculars 2; (9) temporals 4, arranged in three rows (1+1+2); (10) maxillary teeth 30, the last 3 enlarged, without diastem; (11) postocular streak presence; (12) background color of dorsal brownish black, a conspicuous, uniform, continuous beige stripe extending from behind the eye to the end of the tail; (13) anterior venter creamish-yellow, gradually fades to the rear, with irregular black blotches in the middle and outer quarter of ventrals, the posterior part almost completely black. The discovery of the new species increases the number of species in the genus Hebius to 51.


Assuntos
Colubridae , Lagartos , Feminino , Animais , China , Distribuição Animal , Cauda , Estruturas Animais , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA