Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38913509

RESUMO

Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes. UDA typically relies on paired day-night image pairs to guide adaptation, but this approach hampers dataset construction and restricts generalization across night scenes in different datasets. Moreover, UDA, focusing on network architecture and training strategies, faces difficulties in handling classes with few domain similarities. In this paper, we leverage Prompt Images Guidance (PIG) to enhance UDA with supplementary night knowledge. We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images. To generate high-quality pseudo-labels, we propose Pseudo-label Fusion via Domain Similarity Guidance (FDSG). Classes with fewer domain similarities are predicted by NFNet, which excels in parsing night features, while classes with more domain similarities are predicted by UDA, which has rich labeled semantics. Additionally, we propose two data augmentation strategies: the Prompt Mixture Strategy (PMS) and the Alternate Mask Strategy (AMS), aimed at mitigating the overfitting of the NFNet to a few prompt images. We conduct extensive experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC. The results indicate that utilizing PIG can enhance the parsing accuracy of UDA. The code is available at https://github.com/qiurui4shu/PIG.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38557621

RESUMO

Due to the unsatisfactory performance of supervised methods on unpaired real-world scans, point cloud completion via cross-domain adaptation has recently drawn growing attention. Nevertheless, previous approaches only focus on alleviating the distribution shift through domain alignment, resulting in massive information loss of real-world domain data. To tackle this issue, we propose a dual mixup-induced consistency regularization to integrate both source and target domain to improve robustness and generalization capability. Specifically, we mix up virtual and real-world shapes in the input and latent feature space respectively, and then regularize the completion network by forcing two kinds of mixed completion predictions to be consistent. To further adapt to each instance within the real-world domain, we design a novel density-aware refiner to utilize local context information to preserve the fine-grained details and remove noise or outliers for coarse completion. Extensive experiments on real-world scans and our synthetic unpaired datasets demonstrate the superiority of our method over existing state-of-the-art approaches.

3.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4551-4566, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38133979

RESUMO

Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework. But when it generalizes to arbitrary viewpoints, such strategy fails as the mutual information terms of consistency become complicated. This paper presents Multi-View Variational Distillation (MV 2D), tackling the above limitations for generalized multi-view learning. Uniquely, MV 2D can recognize useful consistent information and prioritize diverse components by their generalization ability. This guides an analytical and scalable solution to achieving both sufficiency and consistency. Additionally, by rigorously reformulating the IB objective, MV 2D tackles the difficulties in MI optimization and fully realizes the theoretical advantages of the information bottleneck principle. We extensively evaluate our model on diverse tasks to verify its effectiveness, where the considerable gains provide key insights into achieving generalized multi-view representations under a rigorous information-theoretic principle.

4.
Artigo em Inglês | MEDLINE | ID: mdl-37847635

RESUMO

In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas when animating expressions. We solve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.

5.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15328-15344, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37751346

RESUMO

Hidden features in the neural networks usually fail to learn informative representation for 3D segmentation as supervisions are only given on output prediction, while this can be solved by omni-scale supervision on intermediate layers. In this paper, we bring the first omni-scale supervision method to 3D segmentation via the proposed gradual Receptive Field Component Reasoning (RFCR), where target Receptive Field Component Codes (RFCCs) is designed to record categories within receptive fields for hidden units in the encoder. Then, target RFCCs will supervise the decoder to gradually infer the RFCCs in a coarse-to-fine categories reasoning manner, and finally obtain the semantic labels. To purchase more supervisions, we also propose an RFCR-NL model with complementary negative codes (i.e., Negative RFCCs, NRFCCs) with negative learning. Because many hidden features are inactive with tiny magnitudes and make minor contributions to RFCC prediction, we propose Feature Densification with a centrifugal potential to obtain more unambiguous features, and it is in effect equivalent to entropy regularization over features. More active features can unleash the potential of omni-supervision method. We embed our method into three prevailing backbones, which are significantly improved in all three datasets on both fully and weakly supervised segmentation tasks and achieve competitive performances.

6.
IEEE Trans Med Imaging ; 42(9): 2740-2750, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37018113

RESUMO

U-Nets have achieved tremendous success in medical image segmentation. Nevertheless, it may have limitations in global (long-range) contextual interactions and edge-detail preservation. In contrast, the Transformer module has an excellent ability to capture long-range dependencies by leveraging the self-attention mechanism into the encoder. Although the Transformer module was born to model the long-range dependency on the extracted feature maps, it still suffers high computational and spatial complexities in processing high-resolution 3D feature maps. This motivates us to design an efficient Transformer-based UNet model and study the feasibility of Transformer-based network architectures for medical image segmentation tasks. To this end, we propose to self-distill a Transformer-based UNet for medical image segmentation, which simultaneously learns global semantic information and local spatial-detailed features. Meanwhile, a local multi-scale fusion block is first proposed to refine fine-grained details from the skipped connections in the encoder by the main CNN stem through self-distillation, only computed during training and removed at inference with minimal overhead. Extensive experiments on BraTS 2019 and CHAOS datasets show that our MISSU achieves the best performance over previous state-of-the-art methods. Code and models are available at: https://github.com/wangn123/MISSU.git.


Assuntos
Processamento de Imagem Assistida por Computador , Semântica
7.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8954-8968, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37022055

RESUMO

Domain adaptation aims to bridge the domain shifts between the source and the target domain. These shifts may span different dimensions such as fog, rainfall, etc. However, recent methods typically do not consider explicit prior knowledge about the domain shifts on a specific dimension, thus leading to less desired adaptation performance. In this article, we study a practical setting called Specific Domain Adaptation (SDA) that aligns the source and target domains in a demanded-specific dimension. Within this setting, we observe the intra-domain gap induced by different domainness (i.e., numerical magnitudes of domain shifts in this dimension) is crucial when adapting to a specific domain. To address the problem, we propose a novel Self-Adversarial Disentangling (SAD) framework. In particular, given a specific dimension, we first enrich the source domain by introducing a domainness creator with providing additional supervisory signals. Guided by the created domainness, we design a self-adversarial regularizer and two loss functions to jointly disentangle the latent representations into domainness-specific and domainness-invariant features, thus mitigating the intra-domain gap. Our method can be easily taken as a plug-and-play framework and does not introduce any extra costs in the inference time. We achieve consistent improvements over state-of-the-art methods in both object detection and semantic segmentation.

8.
IEEE Trans Image Process ; 32: 2386-2398, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37071518

RESUMO

Night-Time Scene Parsing (NTSP) is essential to many vision applications, especially for autonomous driving. Most of the existing methods are proposed for day-time scene parsing. They rely on modeling pixel intensity-based spatial contextual cues under even illumination. Hence, these methods do not perform well in night-time scenes as such spatial contextual cues are buried in the over-/under-exposed regions in night-time scenes. In this paper, we first conduct an image frequency-based statistical experiment to interpret the day-time and night-time scene discrepancies. We find that image frequency distributions differ significantly between day-time and night-time scenes, and understanding such frequency distributions is critical to NTSP problem. Based on this, we propose to exploit the image frequency distributions for night-time scene parsing. First, we propose a Learnable Frequency Encoder (LFE) to model the relationship between different frequency coefficients to measure all frequency components dynamically. Second, we propose a Spatial Frequency Fusion module (SFF) that fuses both spatial and frequency information to guide the extraction of spatial context features. Extensive experiments show that our method performs favorably against the state-of-the-art methods on the NightCity, NightCity+ and BDD100K-night datasets. In addition, we demonstrate that our method can be applied to existing day-time scene parsing methods and boost their performance on night-time scenes. The code is available at https://github.com/wangsen99/FDLNet.

9.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3492-3504, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35687623

RESUMO

Mirror detection is challenging because the visual appearances of mirrors change depending on those of their surroundings. As existing mirror detection methods are mainly based on extracting contextual contrast and relational similarity between mirror and non-mirror regions, they may fail to identify a mirror region if these assumptions are violated. Inspired by a recent study of applying a CNN to help distinguish whether an image is flipped or not based on the visual chirality property, in this paper, we rethink this image-level visual chirality property and reformulate it as a learnable pixel level cue for mirror detection. Specifically, we first propose a novel flipping-convolution-flipping (FCF) transformation to model visual chirality as learnable commutative residual. We then propose a novel visual chirality embedding (VCE) module to exploit this commutative residual in multi-scale feature maps, to embed the visual chirality features into our mirror detection model. Besides, we also propose a visual chirality-guided edge detection (CED) module to integrate the visual chirality features with contextual features for detection refinement. Extensive experiments show that the proposed method outperforms state-of-the-art methods on three benchmark datasets.

10.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 7853-7869, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36417746

RESUMO

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their performance on Video Object Detection (VOD) has not been well explored. In this paper, we present TransVOD, the first end-to-end video object detection system based on simple yet effective spatial-temporal Transformer architectures. The first goal of this paper is to streamline the pipeline of current VOD, effectively removing the need for many hand-crafted components for feature aggregation, e.g., optical flow model, relation networks. Besides, benefited from the object query design in DETR, our method does not need post-processing methods such as Seq-NMS. In particular, we present a temporal Transformer to aggregate both the spatial object queries and the feature memories of each frame. Our temporal transformer consists of two components: Temporal Query Encoder (TQE) to fuse object queries, and Temporal Deformable Transformer Decoder (TDTD) to obtain current frame detection results. These designs boost the strong baseline deformable DETR by a significant margin (3 %-4 % mAP) on the ImageNet VID dataset. TransVOD yields comparable performances on the benchmark of ImageNet VID. Then, we present two improved versions of TransVOD including TransVOD++ and TransVOD Lite. The former fuses object-level information into object query via dynamic convolution while the latter models the entire video clips as the output to speed up the inference time. We give detailed analysis of all three models in the experiment part. In particular, our proposed TransVOD++ sets a new state-of-the-art record in terms of accuracy on ImageNet VID with 90.0 % mAP. Our proposed TransVOD Lite also achieves the best speed and accuracy trade-off with 83.7 % mAP while running at around 30 FPS on a single V100 GPU device. Code and models are available at https://github.com/SJTU-LuHe/TransVOD.

11.
IET Image Process ; 2022 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-36246853

RESUMO

Coronavirus Disease 2019 (Covid-19) overtook the worldwide in early 2020, placing the world's health in threat. Automated lung infection detection using Chest X-ray images has a ton of potential for enhancing the traditional covid-19 treatment strategy. However, there are several challenges to detect infected regions from Chest X-ray images, including significant variance in infected features similar spatial characteristics, multi-scale variations in texture shapes and sizes of infected regions. Moreover, high parameters with transfer learning are also a constraints to deploy deep convolutional neural network(CNN) models in real time environment. A novel covid-19 lightweight CNN(LW-CovidNet) method is proposed to automatically detect covid-19 infected regions from Chest X-ray images to address these challenges. In our proposed hybrid method of integrating Standard and Depth-wise Separable convolutions are used to aggregate the high level features and also compensate the information loss by increasing the Receptive Field of the model. The detection boundaries of disease regions representations are then enhanced via an Edge-Attention method by applying heatmaps for accurate detection of disease regions. Extensive experiments indicate that the proposed LW-CovidNet surpasses most cutting-edge detection methods and also contributes to the advancement of state-of-the-art performance. It is envisaged that with reliable accuracy, this method can be introduced for clinical practices in the future.

12.
IEEE Trans Vis Comput Graph ; 28(4): 1835-1847, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33001803

RESUMO

We propose a robust normal estimation method for both point clouds and meshes using a low rank matrix approximation algorithm. First, we compute a local isotropic structure for each point and find its similar, non-local structures that we organize into a matrix. We then show that a low rank matrix approximation algorithm can robustly estimate normals for both point clouds and meshes. Furthermore, we provide a new filtering method for point cloud data to smooth the position data to fit the estimated normals. We show the applications of our method to point cloud filtering, point set upsampling, surface reconstruction, mesh denoising, and geometric texture removal. Our experiments show that our method generally achieves better results than existing methods.

13.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5780-5795, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33848241

RESUMO

Pixel-level 2D object semantic understanding is an important topic in computer vision and could help machine deeply understand objects (e.g., functionality and affordance) in our daily life. However, most previous methods directly train on correspondences in 2D images, which is end-to-end but loses plenty of information in 3D spaces. In this paper, we propose a new method on predicting image corresponding semantics in 3D domain and then projecting them back onto 2D images to achieve pixel-level understanding. In order to obtain reliable 3D semantic labels that are absent in current image datasets, we build a large scale keypoint knowledge engine called KeypointNet, which contains 103,450 keypoints and 8,234 3D models from 16 object categories. Our method leverages the advantages in 3D vision and can explicitly reason about objects self-occlusion and visibility. We show that our method gives comparative and even superior results on standard semantic benchmarks.

14.
Med Image Anal ; 75: 102279, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34731776

RESUMO

Brain functional connectivity (FC) derived from resting-state functional magnetic resonance imaging (rs-fMRI) has been widely employed to study neuropsychiatric disorders such as autism spectrum disorder (ASD). Existing studies usually suffer from (1) significant data heterogeneity caused by different scanners or studied populations in multiple sites, (2) curse of dimensionality caused by millions of voxels in each fMRI scan and a very limited number (tens or hundreds) of training samples, and (3) poor interpretability, which hinders the identification of reproducible disease biomarkers. To this end, we propose a Multi-site Clustering and Nested Feature Extraction (MC-NFE) method for fMRI-based ASD detection. Specifically, we first divide multi-site training data into ASD and healthy control (HC) groups. To model inter-site heterogeneity within each category, we use a similarity-driven multiview linear reconstruction model to learn latent representations and perform subject clustering within each group. We then design a nested singular value decomposition (SVD) method to mitigate inter-site heterogeneity and extract FC features by learning both local cluster-shared features across sites within each category and global category-shared features across ASD and HC groups, followed by a linear support vector machine (SVM) for ASD detection. Experimental results on 609 subjects with rs-fMRI from the ABIDE database with 21 imaging sites suggest that the proposed MC-NFE outperforms several state-of-the-art methods in ASD detection. The most discriminative FCs identified by the MC-NFE are mainly located in default mode network, salience network, and cerebellum region, which could be used as potential biomarkers for fMRI-based ASD analysis.


Assuntos
Transtorno do Espectro Autista , Imageamento por Ressonância Magnética , Transtorno do Espectro Autista/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico , Análise por Conglomerados , Humanos
15.
IEEE Trans Pattern Anal Mach Intell ; 44(12): 9489-9502, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34822324

RESUMO

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point. In addition, we extend PRIN to a sparse version called SPRIN, which directly operates on sparse point clouds. Both PRIN and SPRIN can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. Results show that, on the dataset with randomly rotated point clouds, SPRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide thorough theoretical proof and analysis for point-wise rotation invariance achieved by our methods. The code to reproduce our results will be made publicly available.

16.
IEEE Trans Image Process ; 30: 9085-9098, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34705644

RESUMO

Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions. In this work, we aim to address the night-time scene parsing (NTSP) problem, which has two main challenges: 1) labeled night-time data are scarce, and 2) over- and under-exposures may co-occur in the input night-time images and are not explicitly modeled in existing pipelines. To tackle the scarcity of night-time data, we collect a novel labeled dataset, named NightCity, of 4,297 real night-time images with ground truth pixel-level semantic annotations. To our knowledge, NightCity is the largest dataset for NTSP. In addition, we also propose an exposure-aware framework to address the NTSP problem through augmenting the segmentation process with explicitly learned exposure features. Extensive experiments show that training on NightCity can significantly improve NTSP performances and that our exposure-aware model outperforms the state-of-the-art methods, yielding top performances on our dataset as well as existing datasets.

17.
IEEE Trans Image Process ; 30: 4610-4621, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33886470

RESUMO

Facial expression transfer between two unpaired images is a challenging problem, as fine-grained expression is typically tangled with other facial attributes. Most existing methods treat expression transfer as an application of expression manipulation, and use predicted global expression, landmarks or action units (AUs) as a guidance. However, the prediction may be inaccurate, which limits the performance of transferring fine-grained expression. Instead of using an intermediate estimated guidance, we propose to explicitly transfer facial expression by directly mapping two unpaired input images to two synthesized images with swapped expressions. Specifically, considering AUs semantically describe fine-grained expression details, we propose a novel multi-class adversarial training method to disentangle input images into two types of fine-grained representations: AU-related feature and AU-free feature. Then, we can synthesize new images with preserved identities and swapped expressions by combining AU-free features with swapped AU-related features. Moreover, to obtain reliable expression transfer results of the unpaired input, we introduce a swap consistency loss to make the synthesized images and self-reconstructed images indistinguishable. Extensive experiments show that our approach outperforms the state-of-the-art expression manipulation methods for transferring fine-grained expressions while preserving other attributes including identity and pose.

18.
IEEE Trans Neural Netw Learn Syst ; 32(2): 868-881, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32287010

RESUMO

In this article, we propose a multiview self-representation model for nonlinear subspaces clustering. By assuming that the heterogeneous features lie within the union of multiple linear subspaces, the recent multiview subspace learning methods aim to capture the complementary and consensus from multiple views to boost the performance. However, in real-world applications, data feature usually resides in multiple nonlinear subspaces, leading to undesirable results. To this end, we propose a kernelized version of tensor-based multiview subspace clustering, which is referred to as Kt-SVD-MSC, to jointly learn self-representation coefficients in mapped high-dimensional spaces and multiple views correlation in unified tensor space. In view-specific feature space, a kernel-induced mapping is introduced for each view to ensure the separability of self-representation coefficients. In unified tensor space, a new kind of tensor low-rank regularizer is employed on the rotated self-representation coefficient tensor to preserve the global consistency across different views. We also derive an algorithm to efficiently solve the optimization problem with all the subproblems having closed-form solutions. Furthermore, by incorporating the nonnegative and sparsity constraints, the proposed method can be easily extended to a useful variant, meaning that several useful variants can be easily constructed in a similar way. Extensive experiments of the proposed method are tested on eight challenging data sets, in which a significant (even a breakthrough) advance over state-of-the-art multiview clustering is achieved.

19.
Clin Genet ; 96(4): 290-299, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31219622

RESUMO

Noonan syndrome (NS) is a common autosomal dominant/recessive disorder. No large-scale study has been conducted on NS in China, which is the most populous country in the world. Next-generation sequencing (NGS) was used to identify pathogenic variants in patients that exhibited NS-related phenotypes. We assessed the facial features and clinical manifestations of patients with pathogenic or likely pathogenic variants in the RAS-MAPK signaling pathway. Gene-related Chinese NS facial features were described using artificial intelligence (AI).NGS identified pathogenic variants in 103 Chinese patients in eight NS-related genes: PTPN11 (48.5%), SOS1 (12.6%), SHOC2 (11.7%), KRAS (9.71%), RAF1 (7.77%), RIT1 (6.8%), CBL (0.97%), NRAS (0.97%), and LZTR1 (0.97%). Gene-related facial representations showed that each gene was associated with different facial details. Eight novel pathogenic variants were detected and clinical features because of specific genetic variants were reported, including hearing loss, cancer risk due to a PTPN11 pathogenic variant, and ubiquitous abnormal intracranial structure due to SHOC2 pathogenic variants. NGS facilitates the diagnosis of NS, especially for patients with mild/moderate and atypical symptoms. Our study describes the genotypic and phenotypic spectra of NS in China, providing new insights into distinctive clinical features due to specific pathogenic variants.


Assuntos
Estudos de Associação Genética , Predisposição Genética para Doença , Síndrome de Noonan/diagnóstico , Síndrome de Noonan/genética , Adolescente , Alelos , Criança , Pré-Escolar , China , Fácies , Feminino , Estudos de Associação Genética/métodos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Lactente , Recém-Nascido , Masculino , Fenótipo , Ultrassonografia
20.
Int J Med Robot ; 14(5): e1931, 2018 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-29956447

RESUMO

BACKGROUND: Human-computer interaction (HCI) is an important feature of augmented reality (AR) technology. The naturalness is the inevitable trend of HCI. Gesture is the most natural and frequently used body auxiliary interaction mode in daily interactions except for language. However, there are often meaningless, subconscious gesture intervals between the two adjacent dynamic gestures. So, continuous dynamic gesture spotting is the premise and basis of dynamic gesture recognition, but there is no mature and unified algorithm to solve this problem. AIMS: In order to realize the natural HCI based on gesture recognition entirely, a general AR application development platform is presented in this paper. METHODS: According to the position and pose tracking data of the user's hand, the dynamic gesture spotting algorithm based on evidence theory is proposed. Firstly, Through analysis of the speed change of hand motion during the dynamic gestures, three knowledge rules are summed up. Then, accurate dynamic gesture spotting is realized with the application of evidence reasoning. Moreover, this algorithm first detects the starting point of gesture in the rising trend of hand motion speed, eliminates the delay between spotting and recognition, and thus ensures real-time performance. Finally, the algorithm is verified in several AR applications developed on the platform. RESULTS: There are two main experimental results. First, there are six users participating in the dynamic gesture spotting experiment, and the gesture spotting accuracy can meet the demand. Second, The accuracy of recognition after spotting is higher than that of the simultaneous recognition and spotting. CONCLUSION: So, It can be concluded that the proposed continuous dynamic gesture spotting algorithm based on Dempster-Shafer theory can extract almost all the effective dynamic gestures in the HCI of our AR platform, and on this basis, it can effectively improve the accuracy of the subsequent dynamic gesture recognition.


Assuntos
Algoritmos , Gestos , Modelos Teóricos , Interface Usuário-Computador , Processamento Eletrônico de Dados , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA