Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 8 de 8
Filtrer
Plus de filtres










Base de données
Sujet principal
Gamme d'année
1.
Article de Anglais | MEDLINE | ID: mdl-37494175

RÉSUMÉ

Gesture recognition has drawn considerable attention from many researchers owing to its wide range of applications. Although significant progress has been made in this field, previous works always focus on how to distinguish between different gesture classes, ignoring the influence of inner-class divergence caused by gesture-irrelevant factors. Meanwhile, for multimodal gesture recognition, feature or score fusion in the final stage is a general choice to combine the information of different modalities. Consequently, the gesture-relevant features in different modalities may be redundant, whereas the complementarity of modalities is not exploited sufficiently. To handle these problems, we propose a hierarchical gesture prototype framework to highlight gesture-relevant features such as poses and motions in this article. This framework consists of a sample-level prototype and a modal-level prototype. The sample-level gesture prototype is established with the structure of a memory bank, which avoids the distraction of gesture-irrelevant factors in each sample, such as the illumination, background, and the performers' appearances. Then the modal-level prototype is obtained via a generative adversarial network (GAN)-based subnetwork, in which the modal-invariant features are extracted and pulled together. Meanwhile, the modal-specific attribute features are used to synthesize the feature of other modalities, and the circulation of modality information helps to leverage their complementarity. Extensive experiments on three widely used gesture datasets demonstrate that our method is effective to highlight gesture-relevant features and can outperform the state-of-the-art methods.

2.
Article de Anglais | MEDLINE | ID: mdl-37440377

RÉSUMÉ

Accurately extracting buildings from aerial images has essential research significance for timely understanding human intervention on the land. The distribution discrepancies between diversified unlabeled remote sensing images (changes in imaging sensor, location, and environment) and labeled historical images significantly degrade the generalization performance of deep learning algorithms. Unsupervised domain adaptation (UDA) algorithms have recently been proposed to eliminate the distribution discrepancies without re-annotating training data for new domains. Nevertheless, due to the limited information provided by a single-source domain, single-source UDA (SSUDA) is not an optimal choice when multitemporal and multiregion remote sensing images are available. We propose a multisource UDA (MSUDA) framework SPENet for building extraction, aiming at selecting, purifying, and exchanging information from multisource domains to better adapt the model to the target domain. Specifically, the framework effectively utilizes richer knowledge by extracting target-relevant information from multiple-source domains, purifying target domain information with low-level features of buildings, and exchanging target domain information in an interactive learning manner. Extensive experiments and ablation studies constructed on 12 city datasets prove the effectiveness of our method against existing state-of-the-art methods, e.g., our method achieves 59.1% intersection over union (IoU) on Austin and Kitsap → Potsdam, which surpasses the target domain supervised method by 2.2% . The code is available at https://github.com/QZangXDU/SPENet.

3.
Front Neurorobot ; 17: 1181598, 2023.
Article de Anglais | MEDLINE | ID: mdl-37283784

RÉSUMÉ

Speech emotion recognition is challenging due to the subjectivity and ambiguity of emotion. In recent years, multimodal methods for speech emotion recognition have achieved promising results. However, due to the heterogeneity of data from different modalities, effectively integrating different modal information remains a difficulty and breakthrough point of the research. Moreover, in view of the limitations of feature-level fusion and decision-level fusion methods, capturing fine-grained modal interactions has often been neglected in previous studies. We propose a method named multimodal transformer augmented fusion that uses a hybrid fusion strategy, combing feature-level fusion and model-level fusion methods, to perform fine-grained information interaction within and between modalities. A Model-fusion module composed of three Cross-Transformer Encoders is proposed to generate multimodal emotional representation for modal guidance and information fusion. Specifically, the multimodal features obtained by feature-level fusion and text features are used to enhance speech features. Our proposed method outperforms existing state-of-the-art approaches on the IEMOCAP and MELD dataset.

4.
Article de Anglais | MEDLINE | ID: mdl-36301787

RÉSUMÉ

This article addresses the problem of the building an out-of-the-box deep detector, motivated by the need to perform anomaly detection across multiple hyperspectral images (HSIs) without repeated training. To solve this challenging task, we propose a unified detector anomaly detection network (AUD-Net) inspired by few-shot learning. The crucial issues solved by AUD-Net include: how to improve the generalization of the model on various HSIs that contain different categories of land cover; and how to unify the different spectral sizes between HSIs. To achieve this, we first build a series of subtasks to classify the relations between the center and its surroundings in the dual window. Through relation learning, AUD-Net can be more easily generalized to unseen HSIs, as the relations of the pixel pairs are shared among different HSIs. Secondly, to handle different HSIs with various spectral sizes, we propose a pooling layer based on the vector of local aggregated descriptors, which maps the variable-sized features to the same space and acquires the fixed-sized relation embeddings. To determine whether the center of the dual window is an anomaly, we build a memory model by the transformer, which integrates the contextual relation embeddings in the dual window and estimates the relation embeddings of the center. By computing the feature difference between the estimated relation embeddings of the centers and the corresponding real ones, the centers with large differences will be detected as anomalies, as they are more difficult to be estimated by the corresponding surroundings. Extensive experiments on both the simulation dataset and 13 real HSIs demonstrate that this proposed AUD-Net has strong generalization for various HSIs and achieves significant advantages over the specific-trained detectors for each HSI.

5.
IEEE Trans Image Process ; 31: 6440-6454, 2022.
Article de Anglais | MEDLINE | ID: mdl-36215361

RÉSUMÉ

Outlier detection is to separate anomalous data from inliers in the dataset. Recently, the most deep learning methods of outlier detection leverage an auxiliary reconstruction task by assuming that outliers are more difficult to recover than normal samples (inliers). However, it is not always true in deep auto-encoder (AE) based models. The auto-encoder based detectors may recover certain outliers even if outliers are not in the training data, because they do not constrain the feature learning. Instead, we think outlier detection can be done in the feature space by measuring the distance between outliers' features and the consistency feature of inliers. To achieve this, we propose an unsupervised outlier detection method using a memory module and a contrastive learning module (MCOD). The memory module constrains the consistency of features, which merely represent the normal data. The contrastive learning module learns more discriminative features, which boosts the distinction between outliers and inliers. Extensive experiments on four benchmark datasets show that our proposed MCOD performs well and outperforms eleven state-of-the-art methods.


Sujet(s)
Algorithmes , Apprentissage
6.
Article de Anglais | MEDLINE | ID: mdl-35939475

RÉSUMÉ

This article focuses on end-to-end image matching through joint key-point detection and descriptor extraction. To find repeatable and high discrimination key points, we improve the deep matching network from the perspectives of network structure and network optimization. First, we propose a concurrent multiscale detector (CS-det) network, which consists of several parallel convolutional networks to extract multiscale features and multilevel discriminative information for key-point detection. Moreover, we introduce an attention module to fuse the response maps of various features adaptively. Importantly, we propose two novel rank consistent losses (RC-losses) for network optimization, significantly improving image matching performances. On the one hand, we propose a score rank consistent loss (RC-S-loss) to ensure that the key points have high repeatability. Different from the score difference loss merely focusing on the absolute score of an individual key point, our proposed RC-S-loss pays more attention to the relative score of key points in the image. On the other hand, we propose a score-discrimination RC-loss to ensure that the key point has high discrimination, which can reduce the confusion from other key points in subsequent matching and then further enhance the accuracy of image matching. Extensive experimental results demonstrate that the proposed CS-det improves the mean matching result of deep detector by 1.4%-2.1%, and the proposed RC-losses can boost the matching performances by 2.7%-3.4% than score difference loss. Our source codes are available at https://github.com/iquandou/CS-Net.

7.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3372-3386, 2022 Aug.
Article de Anglais | MEDLINE | ID: mdl-33544676

RÉSUMÉ

Recently, the majority of successful matching approaches are based on convolutional neural networks, which focus on learning the invariant and discriminative features for individual image patches based on image content. However, the image patch matching task is essentially to predict the matching relationship of patch pairs, that is, matching (similar) or non-matching (dissimilar). Therefore, we consider that the feature relation (FR) learning is more important than individual feature learning for image patch matching problem. Motivated by this, we propose an element-wise FR learning network for image patch matching, which transforms the image patch matching task into an image relationship-based pattern classification problem and dramatically improves generalization performances on image matching. Meanwhile, the proposed element-wise learning methods encourage full interaction between feature information and can naturally learn FR. Moreover, we propose to aggregate FR from multilevels, which integrates the multiscale FR for more precise matching. Experimental results demonstrate that our proposal achieves superior performances on cross-spectral image patch matching and single spectral image patch matching, and good generalization on image patch retrieval.

8.
IEEE Trans Image Process ; 30: 7127-7142, 2021.
Article de Anglais | MEDLINE | ID: mdl-34351861

RÉSUMÉ

Deep convolutional neural networks attract increasing attention in image patch matching. However, most of them rely on a single similarity learning model, such as feature distance and the correlation of concatenated features. Their performances will degenerate due to the complex relation between matching patches caused by various imagery changes. To tackle this challenge, we propose a multi-relation attention learning network (MRAN) for image patch matching. Specifically, we propose to fuse multiple feature relations (MR) for matching, which can benefit from the complementary advantages between different feature relations and achieve significant improvements on matching tasks. Furthermore, we propose a relation attention learning module to learn the fused relation adaptively. With this module, meaningful feature relations are emphasized and the others are suppressed. Extensive experiments show that our MRAN achieves best matching performances, and has good generalization on multi-modal image patch matching, multi-modal remote sensing image patch matching and image retrieval tasks.

SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE