Fine-tuning the active timing margin (ATM) control loop for maximizing multi-core efficiency on an IBM POWER server
Active Timing Margin (ATM) is a technology that improves processor efficiency by reducing the pipeline timing margin with a control loop that adjusts voltage and frequency based on real-time chip environment monitoring. Although ATM has already been shown to yield substantial performance benefits, its full potential has yet to be unlocked. In this paper, we investigate how to maximize ATM's efficiency gain with a new means of exposing the inter-core speed variation: Finetuning the ATM control loop. We conduct our analysis and evaluation on a production-grade POWER7+ system. On the POWER7+ server platform, we fine-tune the ATM control loop by programming its Critical Path Monitors, a key component of its ATM design that measures the cores' timing margins. With a robust stress-test procedure, we expose over 200 MHz of inherent inter-core speed differential by fine-tuning the percore ATM control loop. Exploiting this differential, we manage to double the ATM frequency gain over the static timing margin; this is not possible using conventional means, i.e. by setting fixed <v, f> points for each core, because the corelevel <v, f> must account for chip-wide worst-case voltage variation. To manage the significant performance heterogeneity of fine-tuned systems, we propose application scheduling and throttling to manage the chip's process and voltage variation. Our proposal improves application performance by more than 10% over the static margin, almost doubling the 6% improvement of the default, unmanaged ATM system. Our technique is general enough that it can be adopted by any system that employs an active timing margin control loop.