Búsqueda | BVS Bolivia

TTQR: A Traffic- and Thermal-Aware Q-Routing for 3D Network-on-Chip.

Liu, Hanyan; Chen, Xiaowen; Zhao, Yunping; Li, Chen; Lu, Jianzhuang.

Sensors (Basel) ; 22(22)2022 Nov 11.

Artículo en Inglés | MEDLINE | ID: mdl-36433316

RESUMEN

The die-stacking structure of 3D network-on-chips (3D NoC) leads to high power density and unequal thermal conductance between different layers, which results in low reliability and performance degradation of 3D NoCs. Congestion-aware adaptive routing, which is capable of balancing the network's traffic load, can alleviate congestion and thermal problems so as to improve the performance of the network. In this study, we propose a traffic- and thermal-aware Q-routing algorithm (TTQR) based on Q-learning, a reinforcement learning method. The proposed algorithm saves the local traffic status and the global temperature information to the Q1-table and Q2-table, respectively. The values of two tables are updated by the packet header and saved in a small size, which saves the hardware overhead. Based on the ratio of the Q1-value to the Q2-value corresponding to each direction, the packet's output port is selected. As a result, packets are transferred to the chosen path to alleviate thermal problems and achieve more balanced inter-layer traffic. Through the Access Noxim simulation platform, we compare the proposed routing algorithm with the TAAR routing algorithm. According to experimental results using synthetic traffic patterns, our proposed methods outperform the TAAR routing algorithm by an average of 63.6% and 41.4% in average latency and throughput, respectively.

Towards Convolutional Neural Network Acceleration and Compression Based on Simonk-Means.

Wei, Mingjie; Zhao, Yunping; Chen, Xiaowen; Li, Chen; Lu, Jianzhuang.

Sensors (Basel) ; 22(11)2022 Jun 06.

Artículo en Inglés | MEDLINE | ID: mdl-35684919

RESUMEN

Convolutional Neural Networks (CNNs) are popular models that are widely used in image classification, target recognition, and other fields. Model compression is a common step in transplanting neural networks into embedded devices, and it is often used in the retraining stage. However, it requires a high expenditure of time by retraining weight data to atone for the loss of precision. Unlike in prior designs, we propose a novel model compression approach based on Simonk-means, which is specifically designed to support a hardware acceleration scheme. First, we propose an extension algorithm named Simonk-means based on simple k-means. We use Simonk-means to cluster trained weights in convolutional layers and fully connected layers. Second, we reduce the consumption of hardware resources in data movement and storage by using a data storage and index approach. Finally, we provide the hardware implementation of the compressed CNN accelerator. Our evaluations on several classifications show that our design can achieve 5.27× compression and reduce 74.3% of the multiply-accumulate (MAC) operations in AlexNet on the FASHION-MNIST dataset.

Asunto(s)

Compresión de Datos , Redes Neurales de la Computación , Aceleración , Algoritmos , Fenómenos Físicos

An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs.

Zhao, Yunping; Lu, Jianzhuang; Chen, Xiaowen.

Sensors (Basel) ; 20(19)2020 Sep 28.

Artículo en Inglés | MEDLINE | ID: mdl-32998366

RESUMEN

Due to the high throughput and high computing capability of convolutional neural networks (CNNs), researchers are paying increasing attention to the design of CNNs hardware accelerator architecture. Accordingly, in this paper, we propose a block parallel computing algorithm based on the matrix transformation computing algorithm (MTCA) to realize the convolution expansion and resolve the block problem of the intermediate matrix. It enables high parallel implementation on hardware. Moreover, we also provide a specific calculation method for the optimal partition of matrix multiplication to optimize performance. In our evaluation, our proposed method saves more than 60% of hardware storage space compared with the im2col(image to column) approach. More specifically, in the case of large-scale convolutions, it saves nearly 82% of storage space. Under the accelerator architecture framework designed in this paper, we realize the performance of 26.7GFLOPS-33.4GFLOPS (depending on convolution type) on FPGA(Field Programmable Gate Array) by reducing bandwidth and improving data reusability. It is 1.2×-4.0× faster than memory-efficient convolution (MEC) and im2col, respectively, and represents an effective solution for a large-scale convolution accelerator.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA