About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
VLSI Circuits 2020
Conference paper
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
Abstract
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.