Pesquisa | Portal Regional da BVS

Are transformer-based models more robust than CNN-based models?

Liu, Zhendong; Qian, Shuwei; Xia, Changhong; Wang, Chongjun.

Neural Netw ; 172: 106091, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38266475

RESUMO

As the deployment of artificial intelligence (AI) models in real-world settings grows, their open-environment robustness becomes increasingly critical. This study aims to dissect the robustness of deep learning models, particularly comparing transformer-based models against CNN-based models. We focus on unraveling the sources of robustness from two key perspectives: structural and process robustness. Our findings suggest that transformer-based models generally outperform convolution-based models in robustness across multiple metrics. However, we contend that these metrics may not wholly represent true model robustness, such as the mean of corruption error. To better understand the underpinnings of this robustness advantage, we analyze models through the lens of Fourier transform and game interaction. From our insights, we propose a calibrated evaluation metric for robustness against real-world data, and a blur-based method to enhance robustness performance. Our approach achieves state-of-the-art results, with mCE scores of 2.1% on CIFAR-10-C, 12.4% on CIFAR-100-C, and 24.9% on TinyImageNet-C.

Assuntos

Inteligência Artificial , Benchmarking

COM: Contrastive Masked-attention model for incomplete multimodal learning.

Qian, Shuwei; Wang, Chongjun.

Neural Netw ; 162: 443-455, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36965274

RESUMO

Most multimodal learning methods assume that all modalities are always available in data. However, in real-world applications, the assumption is often violated due to privacy protection, sensor failure etc. Previous works for incomplete multimodal learning often suffer from one of the following drawbacks: introducing noise, lacking flexibility to missing patterns and failing to capture interactions between modalities. To overcome these challenges, we propose a COntrastive Masked-attention model (COM). The framework performs cross-modal contrastive learning with GAN-based augmentation to reduce modality gap, and employs a masked-attention model to capture interactions between modalities. The augmentation adapts cross-modal contrastive learning to suit incomplete case by a two-player game, improving the effectiveness of multimodal representations. Interactions between modalities are modeled by stacking self-attention blocks, and attention masks limit them on the observed modalities to avoid extra noise. All kinds of modality combinations share a unified architecture, so the model is flexible to different missing patterns. Extensive experiments on six datasets demonstrate the effectiveness and robustness of the proposed method for incomplete multimodal learning.

Assuntos

Aprendizagem , Privacidade

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA