Power-Limited Inference Performance Optimization Using a Software-Assisted Peak Current Regulation Scheme in a 5-nm AI SoC
Abstract
Discrete AI inference cards, operating under form-factor and system-defined peak power constraints, must serve diverse inference requests with widely varying power consumption. A peak current-limiting scheme is proposed to maximize inference performance across practical use cases. The peak current management block consists of a card-level current sensing circuit with an AI inference-aware feed-forward and feedback control mechanism. The card-level sensing improves performance by eliminating the need for additional margins for power consumed by off-chip components. Compiler-assisted feed-forward control exploits the predictability of AI inferences and proactively manages peak currents without a static reduction in operating frequency. Measurements from an AI system on chip (SoC), fabricated in 5-nm technology, show up to 41% improvement in Bert-Large inference throughput by engaging the peak current control.