Improved deep neural network hardware-Accelerators based on non-volatile-memory: The local gains technique

Irem Boybat; Carmelo Di Nolfo; Stefano Ambrogio; Martina Bodini; Nathan C. P. Farinh; Robert M. Shelby; Pritish Narayanan; S. Sidler; Hsinyu Tsai; Yusuf Leblebici; Geoffrey W. Burr

doi:10.1109/ICRC.2017.8123642

ICRC 2017

Conference paper

28 Nov 2017

Improved deep neural network hardware-Accelerators based on non-volatile-memory: The local gains technique

View publication

Abstract

Cognitive computing - which learns to do useful computational tasks from data, rather than by being programmed explicitly - represents a fundamentally new form of computing. However, training Deep Neural Networks (DNNs) calls for repeated exposure to huge datasets, requiring extensive computation capabilities (such as many GPUs) and days or weeks of time. One potential approach for accelerating this process are hardware accelerators for backpropagation training based on analog Non-Volatile Memory (NVM). This paper describes a novel Local Gains (LG) method which can increase network accuracy, extend the range of acceptable learning rates, and reduce overall weight-update activity (and thus the corresponding energy consumption). We first analyze the impact of different activation functions and the corresponding dynamic range of input and output neurons. We then show that the use of non-negative neuron-Activations offers advantages within a crossbar implementation (without degrading accuracy), by causing the sign of the weight-update to depend only on the sign of the backpropagated error. Then we introduce LG: A neuron-centric (NOT synapsecentric) modulation of the learning rate based on the sign of successive weight updates. The concept of Safety Margin (SM) - the margin by which the correct output neuron exceeded (or failed to exceed) the strongest incorrect neuron - is introduced, providing a novel way to gauge the robustness of DNN classification performance. We use device-Aware DNN simulations to demonstrate higher accuracy, reduced sensitivity to network hyperparameters, and an overall improved training process, as well as lower network activity and reduced energy consumption.

Paper