A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators

Paul N. Whatmough; Saekyu Lee; Marco Donato; Hsea Ching Hsueh; Sam Likun Xi; Udit Gupta; Lillian Pentecost; Glenn G. Ko; David Brooks; Gu-Yeon Wei

doi:10.23919/VLSIC.2019.8778002

VLSI Circuits 2019

Conference paper

01 Jun 2019

A 16nm 25mm² SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators

View publication

Abstract

This paper presents a 25mm2 SoC in 16nm FinFET technology targeting flexible acceleration of compute intensive kernels in DNN, DSP and security algorithms. The SoC includes an always-on sub-system, a dual-core Arm A53 CPU cluster, an embedded FPGA array, and a quad-core cache-coherent accelerator cluster. Measurement results demonstrate the following observations: 1) moving DSP/cryptography kernels from A53 to eFPGA increases energy efficiency between 5.5x-28.9x, 2) the use of cache coherency for datapath accelerators increases throughput by 2.94x, and 3) accelerator flexibility-efficiency (GOPS/W) range spans from 3.1x (A53+SIMD), to 16.5x (eFPGA), to 54.5x (CCA) compared to the dual-core CPU baseline on comparable tasks. The energy per inference on MobileNet-128 CNN shows a peak improvement of 47.6x.

Conference paper