Proceedings of the ACM on Programming Languages

Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics

Download paper


Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications' resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as "dynamic loop characteristics". Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation. We present Beacons Framework, an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics: loop trip-count, loop timing, loop memory footprint, and loop data-reuse behaviour, through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls, which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing. Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.