Impact of Phase-Change Memory Nonidealities on Analog In-Memory Computing Deep Learning Accuracy

Ning Li

MRS Fall Meeting 2023

Talk

26 Nov 2023

Impact of Phase-Change Memory Nonidealities on Analog In-Memory Computing Deep Learning Accuracy

Abstract

Among the emerging approaches for deep neural network (DNN) acceleration, compute-in-memory (CIM) in crossbar arrays using non-volatile memories (NVMs) is very promising for achieving high execution speeds and high energy efficiency. Phase change memory (PCM) is one of the most promising candidates for analog CIM – particularly for inference using previously-trained DNN weights. However, when NVMs are used in analog mode, devices will exhibit nonidealities compared to a perfect analog resistor, limiting the computing accuracy. These nonidealities for PCM devices mainly include resistance drift, read noise, limited memory window, and various device failures due to fabrication yield and limited device endurance and retention. In this presentation, I will talk about our study on the impact of these nonidealities on the analog CIM deep learning accuracy. We systematically varied PCM memory window, resistance drift, and read noise, and studied their impact on the accuracy of large neural networks of various types and with tens of millions of weights.[1] We show that the DNN accuracy can be improved by the PCM with reduced read noise, even if the memory window is also reduced. However, there is a limit on the memory window, below which the accuracy drops significantly. DNN inference accuracy decreases over time due to the PCM resistance drift. However, the long-term accuracy loss can be minimized using an optimized mapping of weights to unit cells with multiple PCMs representing weights of varying significance. The energy efficiency dependence on the resistance drift was also studied [2]. There are tradeoffs and correlations between PCM device characteristics, which we used to identify the device optimization space to achieve better short term and long-term accuracy. In addition, there are typically some failed devices of various types in NVM CIM chips, due to nonperfect fabrication yield and device failure over time. We studied the impact of these different types of failed devices on the analog CIM accuracy for various DNNs.[3] We find that larger networks with fewer reused layers are more tolerable to failed devices. Devices stuck at high resistance states are more tolerable than devices stuck at low resistance states. To improve the robustness of DNNs to defective devices, we developed training methods that add noise and corrupt devices in the weight matrices during network training and show that this can increase the network accuracy in the presence of the failed devices. Our approach is based on a single generic training and a subsequent mapping of the weight to any chip with unknown failed device locations. In summary, we systematically studied the impact of nonidealities of PCMs on the DNN accuracy, using the PCM devices fabricated at IBM [1] and the hardware-aware simulation tool [4] developed at IBM Research. We proposed a few methods, including optimization of device specification, optimized weight mapping, special hardware aware training, to enhance the DNN performance. [1] N. Li, C. Mackin, A. Chen, K. Brew, T. Philip, G.W. Burr, M. Rasch, A. Sebastian, V. Narayanan, N. Saulnier, Advanced Electronic Materials, 2201190, 2023 [2] M. M. Frank, N. Li, M. J. Rasch, S. Jain, C. -T. Chen, R. Muralidhar, J. -P. Han, V. Narayanan, T. M. Philip, K. Brew, A. Simon, I. Saraf, N. Saulnier, I. Boybat, S. Wozniak, A. Sebastian, P. Narayanan, C. Mackin, A. Chen, H. Tsai, G. W. Bur, IEEE International Reliability Physics Symposium, 2023 [3] N. Li, H. Tsai, V. Narayanan, Malte Rasch, APL Machine Learning, 016104, 2023 [4] M. J. Rasch, C. Mackin, M. Le Gallo, A. Chen, A. Fasoli, F. Odermatt, N. Li, S. R. Nandakumar, P. Narayanan, H. Tsai, G. W. Burr, A. Sebastian, V. Narayanan, arxiv, 2302.08469, 2023

Conference paper