Architectures and Circuits for Analog-memory-based Hardware Accelerators for Deep Neural Networks

Sidney Tsai

MRS Fall Meeting 2023

Invited talk

26 Nov 2023

Architectures and Circuits for Analog-memory-based Hardware Accelerators for Deep Neural Networks

Abstract

Analog non-volatile memory (NVM)-based accelerators for Deep Neural Networks (DNNs) can achieve high-throughput and energy-efficiency by computing multiply-accumulate (MAC) operations using Ohm’s law and Kirchhoff’s current law on arrays of resistive memory devices [1]. In recent years, energy-efficient, weight-stationary MAC operations in analog NVM memory-array “Tiles” were demonstrated in hardware with Phase Change Memory (PCM) devices integrated in the backend of 14-nm CMOS [2, 3]. Competitive end-to-end DNN accuracies can be obtained with the help of hardware aware training, accurate weight programming, and sufficiently linear MAC operations in the analog domain [4]. In this paper, I describe architectural and circuit advances for such Analog NVM-based accelerators and specialized digital compute units, designed to accelerate Transformer, Long- Short-Term-Memory (LSTM), and Convolution Neural Networks (CNNs). A highly heterogeneous and programmable accelerator architecture that takes advantage of a dense and efficient circuit-switched 2D mesh to exchange vectors of neuron-activation over short distances in a massively parallel fashion [5] is presented. Based on a 14-nm inference chip consisting of multiple arrays of PCM devices, the impact of memory materials on the accuracy and performance of these systems will be discussed. The author would like to thank colleagues at IBM Research Almaden, Yorktown, Albany, Zurich and Tokyo for their contributions to this work and the IBM Research AI HW Center. [1] G. W. Burr et al. “Ohm’s Law + Kirchhoff’s Current Law = Better AI: Neural- Network Processing Done in Memory with Analog Circuits will Save Energy”. In: IEEE Spectrum 58.12 (2021), pp. 44–49. [2] P. Narayanan et al. “Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration format”. In: Symposium on VLSI Technology. 2021. [3] M. Le Gallo et al. “A 64-core mixed-signal in-memory compute chip based on phase- change memory for deep neural network inference”. In: arXiv.02872 (2022). [4] M. J. Rasch et al. “Hardware-aware training for largescale and diverse deep learning inference workloads using in-memory computing-based accelerators”. In: arXiv preprint arXiv.08469 (2023). [5] S. Jain et al. “A Heterogeneous and Programmable Compute-In-Memory Accelerator Architecture for Analog-AI Using Dense 2-D Mesh”. In: IEEE Trans. VLSI 31.1 (2023), pp. 114–127.

Poster

Temperature-Dependent Transport Properties of High-Performance Metal Halide Perovskites

Chaeyoun Kim, Oki Gunawan, et al.

MRS Fall Meeting 2023

Conference paper

NetZIP: Algorithm/Hardware Co-design of In-network Lossless Compression for Distributed Large Model Training

Jinghan Huang, Hyungyo Kim, et al.

MICRO 2025

Conference paper

3D Die-Stack on Substrate (3D-DSS) Packaging Technology and FEM Analysis for 55um-75um Mixed Pitch Interconnections on High Density Laminate

Katsuyuki Sakuma, Mukta Farooq, et al.

ECTC 2021

Conference paper

Solving optimization tasks power-efficiently exploiting VO₂'s phase-change properties with Oscillating Neural Networks

Olivier Maher, N. Harnack, et al.

DRC 2023

View all publications

Abstract

Related

Temperature-Dependent Transport Properties of High-Performance Metal Halide Perovskites

NetZIP: Algorithm/Hardware Co-design of In-network Lossless Compression for Distributed Large Model Training

3D Die-Stack on Substrate (3D-DSS) Packaging Technology and FEM Analysis for 55um-75um Mixed Pitch Interconnections on High Density Laminate

Solving optimization tasks power-efficiently exploiting VO2's phase-change properties with Oscillating Neural Networks

Solving optimization tasks power-efficiently exploiting VO₂'s phase-change properties with Oscillating Neural Networks