Publication
IEDM 2024
Invited talk

Heterogeneous Embedded Neural Processing Units Utilizing PCM-based Analog In-Memory Computing

Abstract

We propose an embedded Neural Processing Unit (NPU) architecture for deep learning inference to address the stringent energy, area, and cost requirements of edge AI. This heterogeneous architecture integrates a variety of digital and analog accelerator nodesto cater to diverse operation types and precision requirements. To achieve high energy efficiency while maintaining substantial non-volatile on-chip weight capacity, we utilize Analog In-Memory Computing (AIMC) tiles based on Phase-Change Memory (PCM) for MatrixVector Multiplications (MVMs). Additionally, a digital data path and a programmable software cluster facilitate end-to-end inference across multiple precision levels. The NPU is projected to deliver competitive throughput for transformer Neural Networks (NNs), rivaling high-end System-on-Chips (SoCs) for mobile devices and edge accelerators fabricated at more advanced technology nodes.