Publication
IPDPS 2024
Workshop paper

Architecture and Programming of Analog In-Memory-Computing Accelerators for Deep Neural Networks

Abstract

Deep Neural Networks (DNNs) have demonstrated revolutionary capabilities in AI, such as machine vision, natural language processing, and content generation. However, the growing energy usage due to excessive amount of data communication between compute and memory units highlights the need to address the “Von Neumann bottleneck.” In-memory computing can achieve high-throughput and energy-efficiency by computing multiply-accumulate (MAC) operations using Ohm’s law and Kirchhoff’s current law on arrays of resistive memory devices [1]. In recent years, Analog non-volatile memory (NVM)-based accelerators with energy-efficient, weight-stationary MAC operations in analog NVM memory-array “Tiles” have been demonstrated in hardware using Phase Change Memory (PCM) devices integrated in the backend of 14-nm CMOS [2, 3, 4]. Based on the hardware demonstrations, we propose a highly heterogeneous and programmable accelerator architecture that takes advantage of a dense and efficient circuit-switched 2D mesh [5, 6]. This flexible architecture can accelerate Transformer, Long-Short-Term-Memory (LSTM), and Convolution Neural Networks (CNNs) while keeping data communication local and massively parallel. We show that by co-optimizing memory devices, DNN algorithms, and specialized digital circuits, competitive end-to-end DNN accuracies can be obtained with the help of hardware aware training [7, 8]. The author would like to thank all colleagues at IBM Research Almaden, Yorktown, Albany NanoTech, Zurich and Tokyo for their contributions to this work and the IBM Research AI HW Center.