IBM NorthPole: An Architecture for Neural Network Inference with a 12nm Chip

Andrew S. Cassidy; John V. Arthur; Filipp Akopyan; Alexander Andreopoulos; Kumar Appuswamy; Pallab Datta; Michael Debole; Steven Esser; Carlos Ortega Otero; Jun Sawada; Brian Taba; Arnon Amir; Deepika Bablani; Peter Carlson; Flick Flickner; Raj Gandhasri; Guillaume Garreau; Megumi Ito; Jennifer Klamo; Jeff Kusnitz; Nathaniel McClatchey; Jeffrey L. McKinstry; Yutaka Nakamura; Tapan Nayak; Bill Risk; Kai Schleupen; Ben Shaw; Jay Sivagnaname; Daniel Smith; Ignacio Terrizzano; Takanori Ueda; Dharmendra S. Modha

ISSCC 2024

Invited talk

18 Feb 2024

IBM NorthPole: An Architecture for Neural Network Inference with a 12nm Chip

Abstract

The NorthPole Architecture achieves high performance with high efficiency by using local memory within a parallel, distributed core array, linked by networks-on-chip to ensure data availability, orchestrated by prescheduled, distributed local control. A 12nm NorthPole Inference Chip (22B transistors, 795mm2) includes a 256-Core Array with 192MB of distributed SRAM. At nominal 400MHz frequency, it computes TOPS exceeding 200 at 8b-, 400 at 4b-, and 800 at 2b-precision with very high utilization.

Paper