Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format
We report on ARES -- a 14nm Phase Change Memory (PCM)-based test-chip comprising multiple crossbar tiles, each capable of parallel Multiply-ACCumulate (MAC) inference on 512x512 unique weights. A massively-parallel 2-D mesh transports Deep Neural Network (DNN) excitations in duration-format across the chip, between tiles and integrated Landing Pads (LPs) where digital data enters and leaves the chip. For accurate weight-programming (<3% weight-error), we employ a row-wise programming scheme that efficiently programs the 4 PCM devices in each analog weight with minimal overshoot. We implement two DNNs at near-software-equivalent accuracy, demonstrating tile-to-tile transport with a fully-on-chip 2-layer network, and testing resilience to error propagation with a recurrent LSTM network, using off-chip activation functions before looping back to the next on-chip MAC.