About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE TCADIS
Paper
Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs
Abstract
The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip's voltage to the safe limit, i.e., V_{\min } point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if 'eliminated' entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent V_{\min } behavior across the studied applications, and the exact improvement magnitude depends on the program's available guardband. We make fundamental observations about the program-dependent V_{\min } behavior. We experimentally determine that the voltage noise has a more substantial impact on V_{\min } compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels' microarchitectural performance counters to predict its V_{\min } value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate V_{\min } prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.