Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37974507

RESUMO

In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for artificial intelligence-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62 000 samples generated by coarse-grained molecular dynamics. Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art sequential (i.e. recurrent neural network, long short-term memory and Transformer) and structural deep learning models (i.e. graph convolutional network, graph attention network and GraphSAGE), on the accuracy of peptide self-assembly prediction, an essential physiochemical process prior to any peptide-related applications. Extensive benchmarking studies have proven Transformer to be the most powerful sequence-encoding-based deep learning model, pushing the limit of peptide self-assembly prediction to decapeptides. In summary, this work provides a comprehensive benchmark analysis of peptide encoding with advanced deep learning models, serving as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.


Assuntos
Inteligência Artificial , Redes Neurais de Computação , Peptídeos/metabolismo , Aminoácidos , Simulação por Computador
2.
Artigo em Inglês | MEDLINE | ID: mdl-38315588

RESUMO

Unsupervised representation learning (URL) that learns compact embeddings of high-dimensional data without supervision has achieved remarkable progress recently. However, the development of URLs for different requirements is independent, which limits the generalization of the algorithms, especially prohibitive as the number of tasks grows. For example, dimension reduction (DR) methods, t-SNE and UMAP, optimize pairwise data relationships by preserving the global geometric structure, while self-supervised learning, SimCLR and BYOL, focuses on mining the local statistics of instances under specific augmentations. To address this dilemma, we summarize and propose a unified similarity-based URL framework, GenURL, which can adapt to various URL tasks smoothly. In this article, we regard URL tasks as different implicit constraints on the data geometric structure that help to seek optimal low-dimensional representations that boil down to data structural modeling (DSM) and low-dimensional transformation (LDT). Specifically, DSM provides a structure-based submodule to describe the global structures, and LDT learns compact low-dimensional embeddings with given pretext tasks. Moreover, an objective function, general Kullback-Leibler (GKL) divergence, is proposed to connect DSM and LDT naturally. Comprehensive experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GEs), and DR.

3.
Patterns (N Y) ; 4(4): 100714, 2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37123438

RESUMO

Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.

4.
IEEE Trans Neural Netw Learn Syst ; 34(11): 8543-8554, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-35263258

RESUMO

High-dimensional data analysis for exploration and discovery includes two fundamental tasks: deep clustering and data visualization. When these two associated tasks are done separately, as is often the case thus far, disagreements can occur among the tasks in terms of geometry preservation. Namely, the clustering process is often accompanied by the corruption of the geometric structure, whereas visualization aims to preserve the data geometry for better interpretation. Therefore, how to achieve deep clustering and data visualization in an end-to-end unified framework is an important but challenging problem. In this article, we propose a novel neural network-based method, called deep clustering and visualization (DCV), to accomplish the two associated tasks end-to-end to resolve their disagreements. The DCV framework consists of two nonlinear dimensionality reduction (NLDR) transformations: 1) one from the input data space to latent feature space for clustering and 2) the other from the latent feature space to the final 2-D space for visualization. Importantly, the first NLDR transformation is mainly optimized by one Clustering Loss, allowing arbitrary corruption of the geometric structure for better clustering, while the second NLDR transformation is optimized by one Geometry-Preserving Loss to recover the corrupted geometry for better visualization. Extensive comparative results show that the DCV framework outperforms other leading clustering-visualization algorithms in terms of both quantitative evaluation metrics and qualitative visualization.

5.
Commun Biol ; 6(1): 876, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37626165

RESUMO

Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.


Assuntos
Aprendizado Profundo , Benchmarking , Idioma , Redes Neurais de Computação
6.
IEEE Trans Pattern Anal Mach Intell ; 45(2): 1442-1457, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35363609

RESUMO

3D Morphable Model (3DMM) fitting has widely benefited face analysis due to its strong 3D priori. However, previous reconstructed 3D faces suffer from degraded visual verisimilitude due to the loss of fine-grained geometry, which is attributed to insufficient ground-truth 3D shapes, unreliable training strategies and limited representation power of 3DMM. To alleviate this issue, this paper proposes a complete solution to capture the personalized shape so that the reconstructed shape looks identical to the corresponding person. Specifically, given a 2D image as the input, we virtually render the image in several calibrated views to normalize pose variations while preserving the original image geometry. A many-to-one hourglass network serves as the encode-decoder to fuse multiview features and generate vertex displacements as the fine-grained geometry. Besides, the neural network is trained by directly optimizing the visual effect, where two 3D shapes are compared by measuring the similarity between the multiview images rendered from the shapes. Finally, we propose to generate the ground-truth 3D shapes by registering RGB-D images followed by pose and shape augmentation, providing sufficient data for network training. Experiments on several challenging protocols demonstrate the superior reconstruction accuracy of our proposal on the face shape.

7.
Neural Netw ; 161: 626-637, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36827960

RESUMO

Dimensional reduction (DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The two independent branches of DR are feature selection (FS) and feature projection (FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure, but lacks interpretability and sparsity. Moreover, FS and FP are traditionally incompatible categories and have not been unified into an amicable framework. Therefore, we consider that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. This paper proposes a unified framework named Unified Dimensional Reduction Network (UDRN) to integrate FS and FP in an end-to-end way. Furthermore, a novel network framework is designed to implement FS and FP tasks separately using a stacked feature selection network and feature projection network. In addition, a stronger manifold assumption and a novel loss function are proposed. Furthermore, the loss function can leverage the priors of data augmentation to enhance the generalization ability of the proposed UDRN. Finally, comprehensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods (FS, FP, and FS&FP pipeline), especially in downstream tasks such as classification and visualization.


Assuntos
Aprendizagem , Redes Neurais de Computação , Generalização Psicológica
8.
Commun Biol ; 6(1): 243, 2023 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-36871126

RESUMO

Recognition of remote homologous structures is a necessary module in AlphaFold2 and is also essential for the exploration of protein folding pathways. Here, we propose a method, PAthreader, to recognize remote templates and explore folding pathways. Firstly, we design a three-track alignment between predicted distance profiles and structure profiles extracted from PDB and AlphaFold DB, to improve the recognition accuracy of remote templates. Secondly, we improve the performance of AlphaFold2 using the templates identified by PAthreader. Thirdly, we explore protein folding pathways based on our conjecture that dynamic folding information of protein is implicitly contained in its remote homologs. The results show that the average accuracy of PAthreader templates is 11.6% higher than that of HHsearch. In terms of structure modelling, PAthreader outperform AlphaFold2 and ranks first on the CAMEO blind test for the latest three months. Furthermore, we predict protein folding pathways for 37 proteins, in which the results of 7 proteins are almost consistent with those of biological experiments, and the other 30 human proteins have yet to be verified by biological experiments, revealing that folding information can be exploited from remote homologous structures.


Assuntos
Dobramento de Proteína , Reconhecimento Psicológico , Humanos
9.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8342-8357, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37018279

RESUMO

Detecting digital face manipulation has attracted extensive attention due to fake media's potential risks to the public. However, recent advances have been able to reduce the forgery signals to a low magnitude. Decomposition, which reversibly decomposes an image into several constituent elements, is a promising way to highlight the hidden forgery details. In this paper, we investigate a novel 3D decomposition based method that considers a face image as the production of the interaction between 3D geometry and lighting environment. Specifically, we disentangle a face image into four graphics components including 3D shape, lighting, common texture, and identity texture, which are respectively constrained by 3D morphable model, harmonic reflectance illumination, and PCA texture model. Meanwhile, we build a fine-grained morphing network to predict 3D shapes with pixel-level accuracy to reduce the noise in the decomposed elements. Moreover, we propose a composition search strategy that enables an automatic construction of an architecture to mine forgery clues from forgery-relevant components. Extensive experiments validate that the decomposed components highlight forgery artifacts, and the searched architecture extracts discriminative forgery features. Thus, our method achieves the state-of-the-art performance.

10.
Artigo em Inglês | MEDLINE | ID: mdl-37079406

RESUMO

Graph neural networks (GNNs) have recently achieved remarkable success on a variety of graph-related tasks, while such success relies heavily on a given graph structure that may not always be available in real-world applications. To address this problem, graph structure learning (GSL) is emerging as a promising research topic where task-specific graph structure and GNN parameters are jointly learned in an end-to-end unified framework. Despite their great progress, existing approaches mostly focus on the design of similarity metrics or graph construction, but directly default to adopting downstream objectives as supervision, which lacks deep insight into the power of supervision signals. More importantly, these approaches struggle to explain how GSL helps GNNs, and when and why this help fails. In this article, we conduct a systematic experimental evaluation to reveal that GSL and GNNs enjoy consistent optimization goals in terms of improving the graph homophily. Furthermore, we demonstrate theoretically and experimentally that task-specific downstream supervision may be insufficient to support the learning of both graph structure and GNN parameters, especially when the labeled data are extremely limited. Therefore, as a complement to downstream supervision, we propose homophily-enhanced self-supervision for GSL (HES-GSL), a method that provides more supervision for learning an underlying graph structure. A comprehensive experimental study demonstrates that HES-GSL scales well to various datasets and outperforms other leading methods. Our code will be available in https://github.com/LirongWu/Homophily-Enhanced-Self-supervision.

11.
Adv Sci (Weinh) ; 10(31): e2301544, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37749875

RESUMO

Self-assembling of peptides is essential for a variety of biological and medical applications. However, it is challenging to investigate the self-assembling properties of peptides within the complete sequence space due to the enormous sequence quantities. Here, it is demonstrated that a transformer-based deep learning model is effective in predicting the aggregation propensity (AP) of peptide systems, even for decapeptide and mixed-pentapeptide systems with over 10 trillion sequence quantities. Based on the predicted AP values, not only the aggregation laws for designing self-assembling peptides are derived, but the transferability relation among the APs of pentapeptides, decapeptides, and mixed pentapeptides is also revealed, leading to discoveries of self-assembling peptides by concatenating or mixing, as consolidated by experiments. This deep learning approach enables speedy, accurate, and thorough search and design of self-assembling peptides within the complete sequence space of oligopeptides, advancing peptide science by inspiring new biological and medical applications.


Assuntos
Aprendizado Profundo , Peptídeos/química , Oligopeptídeos
12.
Commun Biol ; 6(1): 369, 2023 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-37016133

RESUMO

Dimensionality reduction and visualization play an important role in biological data analysis, such as data interpretation of single-cell RNA sequences (scRNA-seq). It is desired to have a visualization method that can not only be applicable to various application scenarios, including cell clustering and trajectory inference, but also satisfy a variety of technical requirements, especially the ability to preserve inherent structure of data and handle with batch effects. However, no existing methods can accommodate these requirements in a unified framework. In this paper, we propose a general visualization method, deep visualization (DV), that possesses the ability to preserve inherent structure of data and handle batch effects and is applicable to a variety of datasets from different application domains and dataset scales. The method embeds a given dataset into a 2- or 3-dimensional visualization space, with either a Euclidean or hyperbolic metric depending on a specified task type with type static (at a time point) or dynamic (at a sequence of time points) scRNA-seq data, respectively. Specifically, DV learns a structure graph to describe the relationships between data samples, transforms the data into visualization space while preserving the geometric structure of the data and correcting batch effects in an end-to-end manner. The experimental results on nine datasets in complex tissue from human patients or animal development demonstrate the competitiveness of DV in discovering complex cellular relations, uncovering temporal trajectories, and addressing complex batch factors. We also provide a preliminary attempt to pre-train a DV model for visualization of new incoming data.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados
13.
Artigo em Inglês | MEDLINE | ID: mdl-37018566

RESUMO

Graph neural networks (GNNs) have been playing important roles in various graph-related tasks. However, most existing GNNs are based on the assumption of homophily, so they cannot be directly generalized to heterophily settings where connected nodes may have different features and class labels. Moreover, real-world graphs often arise from highly entangled latent factors, but the existing GNNs tend to ignore this and simply denote the heterogeneous relations between nodes as binary-valued homogeneous edges. In this article, we propose a novel relation-based frequency adaptive GNN (RFA-GNN) to handle both heterophily and heterogeneity in a unified framework. RFA-GNN first decomposes an input graph into multiple relation graphs, each representing a latent relation. More importantly, we provide detailed theoretical analysis from the perspective of spectral signal processing. Based on this, we propose a relation-based frequency adaptive mechanism that adaptively picks up signals of different frequencies in each corresponding relation space in the message-passing process. Extensive experiments on synthetic and real-world datasets show qualitatively and quantitatively that RFA-GNN yields truly encouraging results for both the heterophily and heterogeneity settings. Codes are publicly available at: https://github.com/LirongWu/RFA-GNN.

14.
IEEE Trans Image Process ; 31: 3224-3235, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35412980

RESUMO

In our daily life, a large number of activities require identity verification, e.g., ePassport gates. Most of those verification systems recognize who you are by matching the ID document photo (ID face) to your live face image (spot face). The ID vs. Spot (IvS) face recognition is different from general face recognition where each dataset usually contains a small number of subjects and sufficient images for each subject. In IvS face recognition, the datasets usually contain massive class numbers (million or more) while each class only has two image samples (one ID face and one spot face), which makes it very challenging to train an effective model (e.g., excessive demand on GPU memory if conducting the classification on such massive classes, hardly capture the effective features for bisample data of each identity, etc.). To avoid the excessive demand on GPU memory, a two-stage training method is developed, where we first train the model on the dataset in general face recognition (e.g., MS-Celeb-1M) and then employ the metric learning losses (e.g., triplet and quadruplet losses) to learn the features on IvS data with million classes. To extract more effective features for IvS face recognition, we propose two novel algorithms to enhance the network by selecting harder samples for training. Firstly, a Cross-Batch Hard Example Mining (CB-HEM) is proposed to select the hard triplets from not only the current mini-batch but also past dozens of mini-batches (for convenience, we use batch to denote a mini-batch in the following), which can significantly expand the space of sample selection. Secondly, a Pseudo Large Batch (PLB) is proposed to virtually increase the batch size with a fixed GPU memory. The proposed PLB and CB-HEM can be employed simultaneously to train the network, which dramatically expands the selecting space by hundreds of times, where the very hard sample pairs especially the hard negative pairs can be selected for training to enhance the discriminative capability. Extensive comparative evaluations conducted on multiple IvS benchmarks demonstrate the effectiveness of the proposed method.


Assuntos
Identificação Biométrica , Reconhecimento Facial , Algoritmos , Benchmarking , Identificação Biométrica/métodos , Face/anatomia & histologia , Face/diagnóstico por imagem , Humanos
15.
Adv Sci (Weinh) ; 9(33): e2203796, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36202759

RESUMO

The latest biological findings observe that the motionless "lock-and-key" theory is not generally applicable and that changes in atomic sites and binding pose can provide important information for understanding drug binding. However, the computational expenditure limits the growth of protein trajectory-related studies, thus hindering the possibility of supervised learning. A spatial-temporal pre-training method based on the modified equivariant graph matching networks, dubbed ProtMD which has two specially designed self-supervised learning tasks: atom-level prompt-based denoising generative task and conformation-level snapshot ordering task to seize the flexibility information inside molecular dynamics (MD) trajectories with very fine temporal resolutions is presented. The ProtMD can grant the encoder network the capacity to capture the time-dependent geometric mobility of conformations along MD trajectories. Two downstream tasks are chosen to verify the effectiveness of ProtMD through linear detection and task-specific fine-tuning. A huge improvement from current state-of-the-art methods, with a decrease of 4.3% in root mean square error for the binding affinity problem and an average increase of 13.8% in the area under receiver operating characteristic curve and the area under the precision-recall curve for the ligand efficacy problem is observed. The results demonstrate a strong correlation between the magnitude of conformation's motion in the 3D space and the strength with which the ligand binds with its receptor.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Ligantes , Conformação Proteica
16.
Artigo em Inglês | MEDLINE | ID: mdl-36409811

RESUMO

Dimension reduction (DR) is commonly utilized to capture the intrinsic structure and transform high-dimensional data into low-dimensional space while retaining meaningful properties of the original data. It is used in various applications, such as image recognition, single-cell sequencing analysis, and biomarker discovery. However, contemporary parametric-free and parametric DR techniques suffer from several significant shortcomings, such as the inability to preserve global and local features and the pool generalization performance. On the other hand, regarding explainability, it is crucial to comprehend the embedding process, especially the contribution of each part to the embedding process, while understanding how each feature affects the embedding results that identify critical components and help diagnose the embedding process. To address these problems, we have developed a deep neural network method called EVNet, which provides not only excellent performance in structural maintainability but also explainability to the DR therein. EVNet starts with data augmentation and a manifold-based loss function to improve embedding performance. The explanation is based on saliency maps and aims to examine the trained EVNet parameters and contributions of components during the embedding process. The proposed techniques are integrated with a visual interface to help the user to adjust EVNet to achieve better DR performance and explainability. The interactive visual interface makes it easier to illustrate the data features, compare different DR techniques, and investigate DR. An in-depth experimental comparison shows that EVNet consistently outperforms the state-of-the-art methods in both performance measures and explainability.

17.
IEEE Trans Cybern ; 52(5): 3422-3433, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-32816685

RESUMO

The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.


Assuntos
Gestos , Reconhecimento Automatizado de Padrão , Algoritmos , Humanos , Reconhecimento Automatizado de Padrão/métodos
18.
IEEE Trans Pattern Anal Mach Intell ; 43(11): 4008-4020, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-32750774

RESUMO

Face detection has achieved significant progress in recent years. However, high performance face detection still remains a very challenging problem, especially when there exists many tiny faces. In this paper, we present a single-shot refinement face detector namely RefineFace to achieve high performance. Specifically, it consists of five modules: selective two-step regression (STR), selective two-step classification (STC), scale-aware margin loss (SML), feature supervision module (FSM) and receptive field enhancement (RFE). To enhance the regression ability for high location accuracy, STR coarsely adjusts locations and sizes of anchors from high level detection layers to provide better initialization for subsequent regressor. To improve the classification ability for high recall efficiency, STC first filters out most simple negatives from low level detection layers to reduce search space for subsequent classifier, then SML is applied to better distinguish faces from background at various scales and FSM is introduced to let the backbone learn more discriminative features for classification. Besides, RFE is presented to provide more diverse receptive field to better capture faces in some extreme poses. Extensive experiments conducted on WIDER FACE, AFW, PASCAL Face, FDDB, MAFA demonstrate that our method achieves state-of-the-art results and runs at 37.3 FPS with ResNet-18 for VGA-resolution images.

19.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3005-3023, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33166249

RESUMO

Face anti-spoofing (FAS) plays a vital role in securing face recognition systems. Existing methods heavily rely on the expert-designed networks, which may lead to a sub-optimal solution for FAS task. Here we propose the first FAS method based on neural architecture search (NAS), called NAS-FAS, to discover the well-suited task-aware networks. Unlike previous NAS works mainly focus on developing efficient search strategies in generic object classification, we pay more attention to study the search spaces for FAS task. The challenges of utilizing NAS for FAS are in two folds: the networks searched on 1) a specific acquisition condition might perform poorly in unseen conditions, and 2) particular spoofing attacks might generalize badly for unseen attacks. To overcome these two issues, we develop a novel search space consisting of central difference convolution and pooling operators. Moreover, an efficient static-dynamic representation is exploited for fully mining the FAS-aware spatio-temporal discrepancy. Besides, we propose Domain/Type-aware Meta-NAS, which leverages cross-domain/type knowledge for robust searching. Finally, in order to evaluate the NAS transferability for cross datasets and unknown attack types, we release a large-scale 3D mask dataset, namely CASIA-SURF 3DMask, for supporting the new 'cross-dataset cross-type' testing protocol. Experiments demonstrate that the proposed NAS-FAS achieves state-of-the-art performance on nine FAS benchmark datasets with four testing protocols.

20.
IEEE Trans Image Process ; 30: 5626-5640, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34125676

RESUMO

Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings. The code is available at https://github.com/ZitongYu/3DCDC-NAS.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa