Perspective on training fully connected networks with resistive memories: Device requirements for multiple conductances of varying significance

Giorgio Cristiano; Massimo Giordano; Stefano Ambrogio; Louis P. Romero; C. W. Cheng; Pritish Narayanan; Hsinyu Tsai; Robert M. Shelby; Geoffrey W. Burr

doi:10.1063/1.5042462

Journal of Applied Physics

Paper

21 Oct 2018

Perspective on training fully connected networks with resistive memories: Device requirements for multiple conductances of varying significance

Download paper

Abstract

Novel Deep Neural Network (DNN) accelerators based on crossbar arrays of non-volatile memories (NVMs) - such as Phase-Change Memory or Resistive Memory - can implement multiply-accumulate operations in a highly parallelized fashion. In such systems, computation occurs in the analog domain at the location of weight data encoded into the conductances of the NVM devices. This allows DNN training of fully-connected layers to be performed faster and with less energy. Using a mixed-hardware-software experiment, we recently showed that by encoding each weight into four distinct physical devices - a "Most Significant Conductance" pair (MSP) and a "Least Significant Conductance" pair (LSP) - we can train DNNs to software-equivalent accuracy despite the imperfections of real analog memory devices. We surmised that, by dividing the task of updating and maintaining weight values between the two conductance pairs, this approach should significantly relax the otherwise quite stringent device requirements. In this paper, we quantify these relaxed requirements for analog memory devices exhibiting a saturating conductance response, assuming either an immediate or a delayed steep initial slope in conductance change. We discuss requirements on the LSP imposed by the "Open Loop Tuning" performed after each training example and on the MSP due to the "Closed Loop Tuning" performed periodically for weight transfer between the conductance pairs. Using simulations to evaluate the final generalization accuracy of a trained four-neuron-layer fully-connected network, we quantify the required dynamic range (as controlled by the size of the steep initial jump), the tolerable device-to-device variability in both maximum conductance and maximum conductance change, the tolerable pulse-to-pulse variability in conductance change, and the tolerable device yield, for both the LSP and MSP devices. We also investigate various Closed Loop Tuning strategies and describe the impact of the MSP/LSP approach on device endurance.

Conference paper