Search | VHL Regional Portal

1.

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning.

Qu, Liangqiong; Zhou, Yuyin; Liang, Paul Pu; Xia, Yingda; Wang, Feifei; Adeli, Ehsan; Fei-Fei, Li; Rubin, Daniel.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit ; 2022: 10051-10061, 2022 Jun.

Article in English | MEDLINE | ID: mdl-36624800

ABSTRACT

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

2.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

Liang, Paul Pu; Lyu, Yiwei; Fan, Xiang; Wu, Zetian; Cheng, Yun; Wu, Jason; Chen, Leslie; Wu, Peter; Lee, Michelle A; Zhu, Yuke; Salakhutdinov, Ruslan; Morency, Louis-Philippe.

Adv Neural Inf Process Syst ; 2021(DB1): 1-20, 2021 Dec.

Article in English | MEDLINE | ID: mdl-38774625

ABSTRACT

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning spanning innovations in fusion paradigms, optimization objectives, and training approaches. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal machine learning research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized implementations, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

3.

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

Zadeh, Amir; Cao, Yan Sheng; Hessner, Simon; Liang, Paul Pu; Poria, Soujanya; Morency, Louis-Philippe.

Proc Conf Empir Methods Nat Lang Process ; 2020: 1801-1812, 2020 Nov.

Article in English | MEDLINE | ID: mdl-33969362

ABSTRACT

Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.

4.

Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors.

Wang, Yansen; Shen, Ying; Liu, Zhun; Liang, Paul Pu; Zadeh, Amir; Morency, Louis-Philippe.

Proc AAAI Conf Artif Intell ; 33(1): 7216-7223, 2019 Jul.

Article in English | MEDLINE | ID: mdl-32219010

ABSTRACT

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

5.

Multi-attention Recurrent Network for Human Communication Comprehension.

Zadeh, Amir; Liang, Paul Pu; Poria, Soujanya; Vij, Prateek; Cambria, Erik; Morency, Louis-Philippe.

Proc AAAI Conf Artif Intell ; 2018: 5642-5649, 2018 Feb.

Article in English | MEDLINE | ID: mdl-32257595

ABSTRACT

Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape the communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art results performance in all the datasets.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL