Conference paper

Modeling and analysis of dynamic coscheduling in parallel and distributed environments

View publication


Scheduling in large-scale parallel systems has been and continues to be an important and challenging research problem. Several key factors, including the increasing use of off-the-shelf clusters of workstations to build such parallel systems, have resulted in the emergence of a new class of scheduling strategies, broadly referred to as dynamic coscheduling. Unfortunately, the size of both the design and performance spaces of these emerging scheduling strategies is quite large, due in part to the numerous dynamic interactions among the different components of the parallel computing environment as well as the wide range of applications and systems that can comprise the parallel environment. This in turn makes it difficult to fully explore the benefits and limitations of the various proposed dynamic coscheduling approaches for large-scale systems solely with the use of simulation and/or experimentation. To gain a better understanding of the fundamental properties of different dynamic coscheduling methods, we formulate a general mathematical model of this class of scheduling strategies within a unified framework that allows us to investigate a wide range of parallel environments. We derive a matrix-analytic analysis based on a stochastic decomposition and a fixed-point iteration. A large number of numerical experiments are performed in part to examine the accuracy of our approach. These numerical results are in excellent agreement with detailed simulation results. Our mathematical model and analysis is then used to explore several fundamental design and performance tradeoffs associated with the class of dynamic coscheduling policies across a broad spectrum of parallel computing environments.