Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Proc Conf Empir Methods Nat Lang Process ; 2020: 1823-1833, 2020 Nov.
Article in English | MEDLINE | ID: mdl-33969363

ABSTRACT

The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, mean-while keeping competitive performance compared to state-of-the-art methods.

2.
Article in English | MEDLINE | ID: mdl-31056497

ABSTRACT

Heterogeneous domain adaptation (HDA) addresses the task of associating data not only across dissimilar domains but also described by different types of features. Inspired by the recent advances of neural networks and deep learning, we propose a deep leaning model of Transfer Neural Trees (TNT), which jointly solves cross-domain feature mapping, adaptation, and classification in a unified architecture. As the prediction layer in TNT, we introduce Transfer Neural Decision Forest (Transfer- NDF), which is able to learn the neurons in TNT for adaptation by stochastic pruning. In order to handle semi-supervised HDA, a unique embedding loss term is introduced to TNT for preserving prediction and structural consistency between labeled and unlabeled target-domain data. We further show that our TNT can be extended to zero shot learning for associating image and attribute data with promising performance. Finally, experiments on different classification tasks across features, datasets, and modalities would verify the effectiveness of our TNT.

3.
Proc Conf Assoc Comput Linguist Meet ; 2019: 6558-6569, 2019 Jul.
Article in English | MEDLINE | ID: mdl-32362720

ABSTRACT

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

4.
IEEE Trans Image Process ; 25(12): 5552-5562, 2016 12.
Article in English | MEDLINE | ID: mdl-27654485

ABSTRACT

Unsupervised domain adaptation deals with scenarios in which labeled data are available in the source domain, but only unlabeled data can be observed in the target domain. Since the classifiers trained by source-domain data would not be expected to generalize well in the target domain, how to transfer the label information from source to target-domain data is a challenging task. A common technique for unsupervised domain adaptation is to match cross-domain data distributions, so that the domain and distribution differences can be suppressed. In this paper, we propose to utilize the label information inferred from the source domain, while the structural information of the unlabeled target-domain data will be jointly exploited for adaptation purposes. Our proposed model not only reduces the distribution mismatch between domains, improved recognition of target-domain data can be achieved simultaneously. In the experiments, we will show that our approach performs favorably against the state-of-the-art unsupervised domain adaptation methods on benchmark data sets. We will also provide convergence, sensitivity, and robustness analysis, which support the use of our model for cross-domain classification.

SELECTION OF CITATIONS
SEARCH DETAIL
...