A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation.

Sui, Xuefu; Lv, Qunbo; Zhi, Liangjie; Zhu, Baoyu; Yang, Yuanbo; Zhang, Yu; Tan, Zheng

Sui, Xuefu; Lv, Qunbo; Zhi, Liangjie; Zhu, Baoyu; Yang, Yuanbo; Zhang, Yu; Tan, Zheng.

Affiliation

Sui X; Aerospace Information Research Institute, Chinese Academy of Sciences, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China.
Lv Q; School of Optoelectronics, University of Chinese Academy of Sciences, No. 19(A) Yuquan Road, Shijingshan District, Beijing 100049, China.
Zhi L; Department of Key Laboratory of Computational Optical Imagine Technology, CAS, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China.
Zhu B; Aerospace Information Research Institute, Chinese Academy of Sciences, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China.
Yang Y; School of Optoelectronics, University of Chinese Academy of Sciences, No. 19(A) Yuquan Road, Shijingshan District, Beijing 100049, China.
Zhang Y; Department of Key Laboratory of Computational Optical Imagine Technology, CAS, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China.
Tan Z; Aerospace Information Research Institute, Chinese Academy of Sciences, No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China.

Sensors (Basel) ; 23(2)2023 Jan 11.

Article in En | MEDLINE | ID: mdl-36679624

ABSTRACT

ABSTRACT

To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve "lossless" CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.

Subject(s)

Data Compression; Neural Networks, Computer; Algorithms; Computers

Key words

LR tracking; convolutional neural networks; hardware friendly; high parallelism; network compression; regular pruning

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Data Compression Language: En Journal: Sensors (Basel) Year: 2023 Type: Article Affiliation country: China

Fulltext

XML

PubMed Links

Search on Google