SGLFormer: Spiking Global-Local-Fusion Transformer with high performance.

Zhang, Han; Zhou, Chenlin; Yu, Liutao; Huang, Liwei; Ma, Zhengyu; Fan, Xiaopeng; Zhou, Huihui; Tian, Yonghong

Zhang, Han; Zhou, Chenlin; Yu, Liutao; Huang, Liwei; Ma, Zhengyu; Fan, Xiaopeng; Zhou, Huihui; Tian, Yonghong.

Afiliación

Zhang H; AI Department, Peng Cheng Laboratory, Shenzhen, China.
Zhou C; Faculty of Computing, Harbin Institute of Technology, Harbin, China.
Yu L; AI Department, Peng Cheng Laboratory, Shenzhen, China.
Huang L; AI Department, Peng Cheng Laboratory, Shenzhen, China.
Ma Z; AI Department, Peng Cheng Laboratory, Shenzhen, China.
Fan X; National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China.
Zhou H; AI Department, Peng Cheng Laboratory, Shenzhen, China.
Tian Y; AI Department, Peng Cheng Laboratory, Shenzhen, China.

Front Neurosci ; 18: 1371290, 2024.

Article en En | MEDLINE | ID: mdl-38550564

ABSTRACT

ABSTRACT

Introduction:

Spiking Neural Networks (SNNs), inspired by brain science, offer low energy consumption and high biological plausibility with their event-driven nature. However, the current SNNs are still suffering from insufficient performance.

Methods:

Recognizing the brain's adeptness at information processing for various scenarios with complex neuronal connections within and across regions, as well as specialized neuronal architectures for specific functions, we propose a Spiking Global-Local-Fusion Transformer (SGLFormer), that significantly improves the performance of SNNs. This novel architecture enables efficient information processing on both global and local scales, by integrating transformer and convolution structures in SNNs. In addition, we uncover the problem of inaccurate gradient backpropagation caused by Maxpooling in SNNs and address it by developing a new Maxpooling module. Furthermore, we adopt spatio-temporal block (STB) in the classification head instead of global average pooling, facilitating the aggregation of spatial and temporal features.

Results:

SGLFormer demonstrates its superior performance on static datasets such as CIFAR10/CIFAR100, and ImageNet, as well as dynamic vision sensor (DVS) datasets including CIFAR10-DVS and DVS128-Gesture. Notably, on ImageNet, SGLFormer achieves a top-1 accuracy of 83.73% with 64 M parameters, outperforming the current SOTA directly trained SNNs by a margin of 6.66%.

Discussion:

With its high performance, SGLFormer can support more computer vision tasks in the future. The codes for this study can be found in https//github.com/ZhangHanN1/SGLFormer.

Palabras clave

Global-Local-Fusion; Maxpooling; Spiking Neural Network; high performance; spatio-temporal; spiking transformer

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Front Neurosci / Front. neurosci. (Online) / Frontiers in neuroscience (Print) Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google