Búsqueda | BVS Bolivia

Enhanced regularization for on-chip training using analog and temporary memory weights.

Singhal, Raghav; Saraswat, Vivek; Deshmukh, Shreyas; Subramoney, Sreenivas; Somappa, Laxmeesha; Baghini, Maryam Shojaei; Ganguly, Udayan.

Neural Netw ; 165: 1050-1057, 2023 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-37478527

RESUMEN

In-memory computing techniques are used to accelerate artificial neural network (ANN) training and inference tasks. Memory technology and architectural innovations allow efficient matrix-vector multiplications, gradient calculations, and updates to network weights. However, on-chip learning for edge devices is quite challenging due to the frequent updates. Here, we propose using an analog and temporary on-chip memory (ATOM) cell with controllable retention timescales for implementing the weights of an on-chip training task. Measurement results for Read-Write timescales are presented for an ATOM cell fabricated in GlobalFoundries' 45 nm RFSOI technology. The effect of limited retention and its variability is evaluated for training a fully connected neural network with a variable number of layers for the MNIST hand-written digit recognition task. Our studies show that weight decay due to temporary memory can have benefits equivalent to regularization, achieving a â¼33% reduction in the validation error (from 3.6% to 2.4%). We also show that the controllability of the decay timescale can be advantageous in achieving a further â¼26% reduction in the validation error. This strongly suggests the utility of temporary memory during learning before on-chip non-volatile memories can take over for the storage and inference tasks using the neural network weights. We thus propose an algorithm-circuit codesign in the form of temporary analog memory for high-performing on-chip learning of ANNs.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Aprendizaje , Reconocimiento en Psicología , Cognición

A Survey of Deep Learning on CPUs: Opportunities and Co-Optimizations.

Mittal, Sparsh; Rajput, Poonam; Subramoney, Sreenivas.

IEEE Trans Neural Netw Learn Syst ; 33(10): 5095-5115, 2022 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-33882004

RESUMEN

CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL) workloads in systems ranging from mobile to extreme-end servers. In this article, we present a survey of techniques for optimizing DL applications on CPUs. We include the methods proposed for both inference and training and those offered in the context of mobile, desktop/server, and distributed systems. We identify the areas of strength and weaknesses of CPUs in the field of DL. This article will interest practitioners and researchers in the area of artificial intelligence, computer architecture, mobile systems, and parallel computing.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA