Pesquisa | Portal Regional da BVS

1.

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators.

Rasch, Malte J; Mackin, Charles; Le Gallo, Manuel; Chen, An; Fasoli, Andrea; Odermatt, Frédéric; Li, Ning; Nandakumar, S R; Narayanan, Pritish; Tsai, Hsinyu; Burr, Geoffrey W; Sebastian, Abu; Narayanan, Vijay.

Nat Commun ; 14(1): 5282, 2023 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-37648721

RESUMO

Analog in-memory computing-a promising approach for energy-efficient acceleration of deep learning workloads-computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a realistic crossbar model, we improve significantly on earlier retraining approaches. We show that many larger-scale deep neural networks-including convnets, recurrent networks, and transformers-can in fact be successfully retrained to show iso-accuracy with the floating point implementation. Our results further suggest that nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on accuracy, and that recurrent networks are particularly robust to all nonidealities.

2.

Optimised weight programming for analogue memory-based deep neural networks.

Mackin, Charles; Rasch, Malte J; Chen, An; Timcheck, Jonathan; Bruce, Robert L; Li, Ning; Narayanan, Pritish; Ambrogio, Stefano; Le Gallo, Manuel; Nandakumar, S R; Fasoli, Andrea; Luquin, Jose; Friz, Alexander; Sebastian, Abu; Tsai, Hsinyu; Burr, Geoffrey W.

Nat Commun ; 13(1): 3765, 2022 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-35773285

RESUMO

Analogue memory-based deep neural networks provide energy-efficiency and per-area throughput gains relative to state-of-the-art digital counterparts such as graphics processing units. Recent advances focus largely on hardware-aware algorithmic training and improvements to circuits, architectures, and memory devices. Optimal translation of software-trained weights into analogue hardware weights-given the plethora of complex memory non-idealities-represents an equally important task. We report a generalised computational framework that automates the crafting of complex weight programming strategies to minimise accuracy degradations during inference, particularly over time. The framework is agnostic to network structure and generalises well across recurrent, convolutional, and transformer neural networks. As a highly flexible numerical heuristic, the approach accommodates arbitrary device-level complexity, making it potentially relevant for a variety of analogue memories. By quantifying the limit of achievable inference accuracy, it also enables analogue memory-based deep neural network accelerators to reach their full inference potential.

Assuntos

Redes Neurais de Computação , Software , Computadores

3.

Toward Software-Equivalent Accuracy on Transformer-Based Deep Neural Networks With Analog Memory Devices.

Spoon, Katie; Tsai, Hsinyu; Chen, An; Rasch, Malte J; Ambrogio, Stefano; Mackin, Charles; Fasoli, Andrea; Friz, Alexander M; Narayanan, Pritish; Stanisavljevic, Milos; Burr, Geoffrey W.

Front Comput Neurosci ; 15: 675741, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34290595

RESUMO

Recent advances in deep learning have been driven by ever-increasing model sizes, with networks growing to millions or even billions of parameters. Such enormous models call for fast and energy-efficient hardware accelerators. We study the potential of Analog AI accelerators based on Non-Volatile Memory, in particular Phase Change Memory (PCM), for software-equivalent accurate inference of natural language processing applications. We demonstrate a path to software-equivalent accuracy for the GLUE benchmark on BERT (Bidirectional Encoder Representations from Transformers), by combining noise-aware training to combat inherent PCM drift and noise sources, together with reduced-precision digital attention-block computation down to INT6.

4.

A role for optics in AI hardware.

Burr, Geoffrey W.

Nature ; 569(7755): 199-200, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-31068716

Assuntos

Inteligência Artificial , Óptica e Fotônica , Sinapses

5.

Training fully connected networks with resistive memories: impact of device failures.

Romero, Louis P; Ambrogio, Stefano; Giordano, Massimo; Cristiano, Giorgio; Bodini, Martina; Narayanan, Pritish; Tsai, Hsinyu; Shelby, Robert M; Burr, Geoffrey W.

Faraday Discuss ; 213(0): 371-391, 2019 02 18.

Artigo em Inglês | MEDLINE | ID: mdl-30357183

RESUMO

Hardware accelerators based on two-terminal non-volatile memories (NVMs) can potentially provide competitive speed and accuracy for the training of fully connected deep neural networks (FC-DNNs), with respect to GPUs and other digital accelerators. We recently proposed [S. Ambrogio et al., Nature, 2018] novel neuromorphic crossbar arrays, consisting of a pair of phase-change memory (PCM) devices combined with a pair of 3-Transistor 1-Capacitor (3T1C) circuit elements, so that each weight was implemented using multiple conductances of varying significance, and then showed that this weight element can train FC-DNNs to software-equivalent accuracies. Unfortunately, however, real arrays of emerging NVMs such as PCM typically include some failed devices (e.g., <100% yield), either due to fabrication issues or early endurance failures, which can degrade DNN training accuracy. This paper explores the impact of device failures, NVM conductances that may contribute read current but which cannot be programmed, on DNN training and test accuracy. Results show that "stuck-on" and "dead" devices, exhibiting high and low read conductances, respectively, do in fact degrade accuracy performance to some degree. We find that the presence of the CMOS-based and thus highly-reliable 3T1C devices greatly increase system robustness. After studying the inherent mechanisms, we study the dependence of DNN accuracy on the number of functional weights, the number of neurons in the hidden layer, and the number and type of damaged devices. Finally, we describe conditions under which making the network larger or adjusting the network hyperparameters can still improve the network accuracy, even in the presence of failed devices.

6.

Equivalent-accuracy accelerated neural-network training using analogue memory.

Ambrogio, Stefano; Narayanan, Pritish; Tsai, Hsinyu; Shelby, Robert M; Boybat, Irem; di Nolfo, Carmelo; Sidler, Severin; Giordano, Massimo; Bodini, Martina; Farinha, Nathan C P; Killeen, Benjamin; Cheng, Christina; Jaoudi, Yassine; Burr, Geoffrey W.

Nature ; 558(7708): 60-67, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29875487

RESUMO

Neural-network training can be slow and energy intensive, owing to the need to transfer the weight data for the network between conventional digital memory chips and processor chips. Analogue non-volatile memory can accelerate the neural-network training algorithm known as backpropagation by performing parallelized multiply-accumulate operations in the analogue domain at the location of the weight data. However, the classification accuracies of such in situ training using non-volatile-memory hardware have generally been less than those of software-based training, owing to insufficient dynamic range and excessive weight-update asymmetry. Here we demonstrate mixed hardware-software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with 'polarity inversion' to cancel out inherent device-to-device variations. We achieve generalization accuracies (on previously unseen data) equivalent to those of software-based training on various commonly used machine-learning test datasets (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100). The computational energy efficiency of 28,065 billion operations per second per watt and throughput per area of 3.6 trillion operations per second per square millimetre that we calculate for our implementation exceed those of today's graphical processing units by two orders of magnitude. This work provides a path towards hardware accelerators that are both fast and energy efficient, particularly on fully connected neural-network layers.

7.

Enhancing Magnetic Light Emission with All-Dielectric Optical Nanoantennas.

Sanz-Paz, Maria; Ernandes, Cyrine; Esparza, Juan Uriel; Burr, Geoffrey W; van Hulst, Niek F; Maitre, Agnès; Aigouy, Lionel; Gacoin, Thierry; Bonod, Nicolas; Garcia-Parajo, Maria F; Bidault, Sébastien; Mivelle, Mathieu.

Nano Lett ; 18(6): 3481-3487, 2018 06 13.

Artigo em Inglês | MEDLINE | ID: mdl-29701991

RESUMO

Electric and magnetic optical fields carry the same amount of energy. Nevertheless, the efficiency with which matter interacts with electric optical fields is commonly accepted to be at least 4 orders of magnitude higher than with magnetic optical fields. Here, we experimentally demonstrate that properly designed photonic nanoantennas can selectively manipulate the magnetic versus electric emission of luminescent nanocrystals. In particular, we show selective enhancement of magnetic emission from trivalent europium-doped nanoparticles in the vicinity of a nanoantenna tailored to exhibit a magnetic resonance. Specifically, by controlling the spatial coupling between emitters and an individual nanoresonator located at the edge of a near-field optical scanning tip, we record with nanoscale precision local distributions of both magnetic and electric radiative local densities of states (LDOS). The map of the radiative LDOS reveals the modification of both the magnetic and electric quantum environments induced by the presence of the nanoantenna. This manipulation and enhancement of magnetic light-matter interaction by means of nanoantennas opens up new possibilities for the research fields of optoelectronics, chiral optics, nonlinear and nano-optics, spintronics, and metamaterials, among others.

8.

Light funneling from a photonic crystal laser cavity to a nano-antenna: overcoming the diffraction limit in optical energy transfer down to the nanoscale.

Mivelle, Mathieu; Viktorovitch, Pierre; Baida, Fadi I; El Eter, Ali; Xie, Zhihua; Vo, Than-Phong; Atie, Elie; Burr, Geoffrey W; Nedeljkovic, Dusan; Rauch, Jean-Yves; Callard, Ségolène; Grosjean, Thierry.

Opt Express ; 22(12): 15075-87, 2014 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-24977600

RESUMO

We show that the near-field coupling between a photonic crystal microlaser and a nano-antenna can enable hybrid photonic systems that are both physically compact (free from bulky optics) and efficient at transferring optical energy into the nano-antenna. Up to 19% of the laser power from a micron-scale photonic crystal laser cavity is experimentally transferred to a bowtie aperture nano-antenna (BNA) whose area is 400-fold smaller than the overall emission area of the microlaser. Instead of a direct deposition of the nano-antenna onto the photonic crystal, it is fabricated at the apex of a fiber tip to be accurately placed in the microlaser near-field. Such light funneling within a hybrid structure provides a path for overcoming the diffraction limit in optical energy transfer to the nanoscale and should thus open promising avenues in the nanoscale enhancement and confinement of light in compact architectures, impacting applications such as biosensing, optical trapping, local heating, spectroscopy, and nanoimaging.

9.

Observation of the role of subcritical nuclei in crystallization of a glassy solid.

Lee, Bong-Sub; Burr, Geoffrey W; Shelby, Robert M; Raoux, Simone; Rettner, Charles T; Bogle, Stephanie N; Darmawikarta, Kristof; Bishop, Stephen G; Abelson, John R.

Science ; 326(5955): 980-4, 2009 Nov 13.

Artigo em Inglês | MEDLINE | ID: mdl-19965508

RESUMO

Phase transformation generally begins with nucleation, in which a small aggregate of atoms organizes into a different structural symmetry. The thermodynamic driving forces and kinetic rates have been predicted by classical nucleation theory, but observation of nanometer-scale nuclei has not been possible, except on exposed surfaces. We used a statistical technique called fluctuation transmission electron microscopy to detect nuclei embedded in a glassy solid, and we used a laser pump-probe technique to determine the role of these nuclei in crystallization. This study provides a convincing proof of the time- and temperature-dependent development of nuclei, information that will play a critical role in the development of advanced materials for phase-change memories.

10.

Experimental demonstration of gray-scale sparse modulation codes in volume holographic storage.

King, Brian M; Burr, Geoffrey W; Neifeld, Mark A.

Appl Opt ; 42(14): 2546-59, 2003 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-12749567

RESUMO

We discuss experimental results of a versatile nonbinary modulation and channel code appropriatefor two-dimentional page-oriented holographic memories. An enumerative permutation code is used to provide a modulation code that permits a simple maximum-likelihood detection scheme. Experimental results from the IBM Demon testbed are used to characterize the performance and feasibility of the proposed modulation and channel codes. A reverse coding technique is introduced to combat the effects of error propagation on the modulation-code performance. We find experimentally that level-3 pixels achieve the beet practical result, offering an 11-35% improvement in capacity and a 12% increase in readout rate as compared with local binary thresholding techniques.

11.

Density implications of shift compensation postprocessing in holographic storage systems.

Menetrier, Laure; Burr, Geoffrey W.

Appl Opt ; 42(5): 845-60, 2003 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-12593488

RESUMO

We investigate the effect of data page misregistration, and its subsequent correction in postprocessing, on the storage density of holographic data storage systems. A numerical simulation is used to obtain the bit-error rate as a function of hologram aperture, page misregistration, pixel fill factors, and Gaussian additive intensity noise. Postprocessing of simulated data pages is performed by a nonlinear pixel shift compensation algorithm [Opt. Lett. 26, 542 (2001)]. The performance of this algorithm is analyzed in the presence of noise by determining the achievable areal density. The impact of inaccurate measurements of page misregistration is also investigated. Results show that the shift-compensation algorithm can provide almost complete immunity to page misregistration, although at some penalty to the baseline areal density offered by a system with zero tolerance to misalignment.

12.

Holographic data storage with arbitrarily misaligned data pages.

Burr, Geoffrey W.

Opt Lett ; 27(7): 542-4, 2002 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-18007859

RESUMO

An improved postprocessing algorithm that can compensate for arbitrary misregistrations between a detector array and the coherent image of a pixelated two-dimensional data page is described. Previously [Opt. Lett. 26, 542 (2001)], an algorithm was reported in which both linear and quadratic interpixel cross-talk contributions are reallocated to the appropriate neighboring pixels. However, page misalignments close to +/-0.5 pixels could not be corrected to an acceptable bit-error rate because of propagation in the iterative procedure. An improved algorithm is reported in which an intentional magnification error is introduced optically and then corrected during postprocessing. Experimental results from a pixel-matched megapel volume holographic system are presented, showing that the dependence of bit-error rate on transverse detector alignment is entirely removed. This improved procedure can completely bypass constraints on page registration, optical distortion, and material shrinkage that currently hamper page-oriented holographic storage systems.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA