Pesquisa | Portal Regional da BVS

A Low-Latency DNN Accelerator Enabled by DFT-Based Convolution Execution Within Crossbar Arrays.

Veluri, Hasita; Chand, Umesh; Chen, Chun-Kuei; Thean, Aaron Voon-Yew.

IEEE Trans Neural Netw Learn Syst ; PP2023 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-38019632

RESUMO

Analog resistive random access memory (RRAM) devices enable parallelized nonvolatile in-memory vector-matrix multiplications for neural networks eliminating the bottlenecks posed by von Neumann architecture. While using RRAMs improves the accelerator performance and enables their deployment at the edge, the high tuning time needed to update the RRAM conductance states adds significant burden and latency to real-time system training. In this article, we develop an in-memory discrete Fourier transform (DFT)-based convolution methodology to reduce system latency and input regeneration. By storing the static DFT/inverse DFT (IDFT) coefficients within the analog arrays, we keep digital computational operations using digital circuits to a minimum. By performing the convolution in reciprocal Fourier space, our approach minimizes connection weight updates, which significantly accelerates both neural network training and interference. Moreover, by minimizing RRAM conductance update frequency, we mitigate the endurance limitations of resistive nonvolatile memories. We show that by leveraging the symmetry and linearity of DFT/IDFTs, we can reduce the power by 1.57 × for convolution over conventional execution. The designed hardware-aware deep neural network (DNN) inference accelerator enhances the peak power efficiency by 28.02 × and area efficiency by 8.7 × over state-of-the-art accelerators. This article paves the way for ultrafast, low-power, compact hardware accelerators.

A Low-Power DNN Accelerator Enabled by a Novel Staircase RRAM Array.

Veluri, Hasita; Chand, Umesh; Li, Yida; Tang, Baoshan; Thean, Aaron Voon-Yew.

IEEE Trans Neural Netw Learn Syst ; 34(8): 4416-4427, 2023 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-34669580

RESUMO

Enhancing the ubiquitous sensors and connected devices with computational abilities to realize visions of the Internet of Things (IoT) requires the development of robust, compact, and low-power deep neural network accelerators. Analog in-memory matrix-matrix multiplications enabled by emerging memories can significantly reduce the accelerator energy budget while resulting in compact accelerators. In this article, we design a hardware-aware deep neural network (DNN) accelerator that combines a planar-staircase resistive random access memory (RRAM) array with a variation-tolerant in-memory compute methodology to enhance the peak power efficiency by 5.64× and area efficiency by 4.7× over state-of-the-art DNN accelerators. Pulse application at the bottom electrodes of the staircase array generates a concurrent input shift, which eliminates the input unfolding, and regeneration required for convolution execution within typical crossbar arrays. Our in-memory compute method operates in charge domain and facilitates high-accuracy floating-point computations with low RRAM states, device requirement. This work provides a path toward fast hardware accelerators that use low power and low area.

Wafer-scale solution-processed 2D material analog resistive memory array for memory-based computing.

Tang, Baoshan; Veluri, Hasita; Li, Yida; Yu, Zhi Gen; Waqar, Moaz; Leong, Jin Feng; Sivan, Maheswari; Zamburg, Evgeny; Zhang, Yong-Wei; Wang, John; Thean, Aaron V-Y.

Nat Commun ; 13(1): 3037, 2022 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-35650181

RESUMO

Realization of high-density and reliable resistive random access memories based on two-dimensional semiconductors is crucial toward their development in next-generation information storage and neuromorphic computing. Here, wafer-scale integration of solution-processed two-dimensional MoS2 memristor arrays are reported. The MoS2 memristors achieve excellent endurance, long memory retention, low device variations, and high analog on/off ratio with linear conductance update characteristics. The two-dimensional nanosheets appear to enable a unique way to modulate switching characteristics through the inter-flake sulfur vacancies diffusion, which can be controlled by the flake size distribution. Furthermore, the MNIST handwritten digits recognition shows that the MoS2 memristors can operate with a high accuracy of >98.02%, which demonstrates its feasibility for future analog memory applications. Finally, a monolithic three-dimensional memory cube has been demonstrated by stacking the two-dimensional MoS2 layers, paving the way for the implementation of two memristor into high-density neuromorphic computing system.

All WSe₂ 1T1R resistive RAM cell for future monolithic 3D embedded memory integration.

Sivan, Maheswari; Li, Yida; Veluri, Hasita; Zhao, Yunshan; Tang, Baoshan; Wang, Xinghua; Zamburg, Evgeny; Leong, Jin Feng; Niu, Jessie Xuhua; Chand, Umesh; Thean, Aaron Voon-Yew.

Nat Commun ; 10(1): 5201, 2019 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-31729375

RESUMO

3D monolithic integration of logic and memory has been the most sought after solution to surpass the Von Neumann bottleneck, for which a low-temperature processed material system becomes inevitable. Two-dimensional materials, with their excellent electrical properties and low thermal budget are potential candidates. Here, we demonstrate a low-temperature hybrid co-integration of one-transistor-one-resistor memory cell, comprising a surface functionalized 2D WSe2 p-FET, with a solution-processed WSe2 Resistive Random Access Memory. The employed plasma oxidation technique results in a low Schottky barrier height of 25 meV with a mobility of 230 cm2 V-1 s-1, leading to a 100x performance enhanced WSe2 p-FET, while the defective WSe2 Resistive Random Access Memory exhibits a switching energy of 2.6 pJ per bit. Furthermore, guided by our device-circuit modelling, we propose vertically stacked channel FETs for high-density sub-0.01 µm2 memory cells, offering a new beyond-Si solution to enable 3-D embedded memories for future computing systems.

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA