Search | VHL Regional Portal

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis.

Wang, Xingbo; He, Jianben; Jin, Zhihua; Yang, Muqiao; Wang, Yong; Qu, Huamin.

IEEE Trans Vis Comput Graph ; 28(1): 802-812, 2022 01.

Article in English | MEDLINE | ID: mdl-34587037

ABSTRACT

Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2 Lens, to visualize and explain multimodal models for sentiment analysis. M2 Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2 Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis.

Tsai, Yao-Hung Hubert; Ma, Martin Q; Yang, Muqiao; Salakhutdinov, Ruslan; Morency, Louis-Philippe.

Proc Conf Empir Methods Nat Lang Process ; 2020: 1823-1833, 2020 Nov.

Article in English | MEDLINE | ID: mdl-33969363

ABSTRACT

The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, mean-while keeping competitive performance compared to state-of-the-art methods.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL