Deep learning acceleration based on in-memory computing

Evangelos Eleftheriou; Geethan Karunaratne; Benedikt Kersting; Milos Stanisavljevic; Vara Prasad Jonnalagadda; Nikolas Ioannou; Kornilios Kourtis; Pier Andrea Francese; Abu Sebastian; Manuel Le Gallo; S. R. Nandakumar; Christophe Piveteau; Irem Boybat; Vinay Joshi; Riduan Khaddam-Aljameh; Martino Dazzi; Iason Giannopoulos

doi:10.1147/JRD.2019.2947008

IBM J. Res. Dev

Paper

01 Nov 2019

Deep learning acceleration based on in-memory computing

View publication

Abstract

Performing computations on conventional von Neumann computing systems results in a significant amount of data being moved back and forth between the physically separated memory and processing units. This costs time and energy, and constitutes an inherent performance bottleneck. In-memory computing is a novel non-von Neumann approach, where certain computational tasks are performed in the memory itself. This is enabled by the physical attributes and state dynamics of memory devices, in particular, resistance-based nonvolatile memory technology. Several computational tasks such as logical operations, arithmetic operations, and even certain machine learning tasks can be implemented in such a computational memory unit. In this article, we first introduce the general notion of in-memory computing and then focus on mixed-precision deep learning training with in-memory computing. The efficacy of this new approach will be demonstrated by training the MNIST multilayer perceptron network achieving high accuracy. Moreover, we show how the precision of in-memory computing can be further improved through architectural and device-level innovations. Finally, we present system aspects, such as high-level system architecture, including core-to-core interconnect technologies, and high-level ideas and concepts of the software stack.

Conference paper