CrossFormer++: A Versatile Vision Transformer Hinging on Cross-Scale Attention.

Wang, Wenxiao; Chen, Wei; Qiu, Qibo; Chen, Long; Wu, Boxi; Lin, Binbin; He, Xiaofei; Liu, Wei

Wang, Wenxiao; Chen, Wei; Qiu, Qibo; Chen, Long; Wu, Boxi; Lin, Binbin; He, Xiaofei; Liu, Wei.

IEEE Trans Pattern Anal Mach Intell ; 46(5): 3123-3136, 2024 May.

Article em En | MEDLINE | ID: mdl-38113150

ABSTRACT

ABSTRACT

While features of different scales are perceptually important to visual inputs, existing vision transformers do not yet take advantage of them explicitly. To this end, we first propose a cross-scale vision transformer, CrossFormer. It introduces a cross-scale embedding layer (CEL) and a long-short distance attention (LSDA). On the one hand, CEL blends each token with multiple patches of different scales, providing the self-attention module itself with cross-scale features. On the other hand, LSDA splits the self-attention module into a short-distance one and a long-distance counterpart, which not only reduces the computational burden but also keeps both small-scale and large-scale features in the tokens. Moreover, through experiments on CrossFormer, we observe another two issues that affect vision transformers' performance, i.e., the enlarging self-attention maps and amplitude explosion. Thus, we further propose a progressive group size (PGS) paradigm and an amplitude cooling layer (ACL) to alleviate the two issues, respectively. The CrossFormer incorporating with PGS and ACL is called CrossFormer++. Extensive experiments show that CrossFormer++ outperforms the other vision transformers on image classification, object detection, instance segmentation, and semantic segmentation tasks.

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: IEEE Trans Pattern Anal Mach Intell Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google