Búsqueda | Portal Regional de la BVS

FPGA-Based High-Throughput CNN Hardware Accelerator With High Computing Resource Utilization Ratio.

Huang, Wenjin; Wu, Huangtao; Chen, Qingkun; Luo, Conghui; Zeng, Shihao; Li, Tianrui; Huang, Yihua.

IEEE Trans Neural Netw Learn Syst ; 33(8): 4069-4083, 2022 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-33587711

RESUMEN

The field-programmable gate array (FPGA)-based CNN hardware accelerator adopting single-computing-engine (CE) architecture or multi-CE architecture has attracted great attention in recent years. The actual throughput of the accelerator is also getting higher and higher but is still far below the theoretical throughput due to the inefficient computing resource mapping mechanism and data supply problem, and so on. To solve these problems, a novel composite hardware CNN accelerator architecture is proposed in this article. To perform the convolution layer (CL) efficiently, a novel multiCE architecture based on a row-level pipelined streaming strategy is proposed. For each CE, an optimized mapping mechanism is proposed to improve its computing resource utilization ratio and an efficient data system with continuous data supply is designed to avoid the idle state of the CE. Besides, to relieve the off-chip bandwidth stress, a weight data allocation strategy is proposed. To perform the fully connected layer (FCL), a single-CE architecture based on a batch-based computing method is proposed. Based on these design methods and strategies, visual geometry group network-16 (VGG-16) and ResNet-101 are both implemented on the XC7VX980T FPGA platform. The VGG-16 accelerator consumed 3395 multipliers and got the throughput of 1 TOPS at 150 MHz, that is, about 98.15% of the theoretical throughput ( 2 ×3395 ×150 MOPS). Similarly, the ResNet-101 accelerator achieved 600 GOPS at 100 MHz, about 96.12% of the theoretical throughput ( 2 ×3121 ×100 MOPS).

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA