Effects of checkpointing and queueing on program performance

V.G. Kulkarni; Victor Nicola; K.S. Trivedi

doi:10.1080/15326349908807166

Communications in Statistics. Stochastic Models

Paper

03 Apr 2007

Effects of checkpointing and queueing on program performance

View publication

Abstract

Checkpointing is a technique for reducing the completion (execution) time of long-running batch programs in the presence of failures. It consists of intermittently saving the current status of the program under execution so that if a failure occurs, the program needs to be restarted from the most recent checkpoint rather than from the beginning. Because of occasional long reprocessing times and the possibility of failures during recovery, checkpoints during reprocessing is guaranteed to increase the effectiveness of any checkpointing technique. In all models considered thus far, checkpointing is allowed only during useful processing of the program. In this paper, we carry out the analysis of a model with checkpointing during reprocessing following failures. Closed-form expressions for the Laplace transform of the program completion time, its mean and variance are derived. It is shown that, asymptotically, the expected completion time increases linearly (exponentially) with the service requirement in the presence (absence) of checkpointing. The question of whether checkpointing is beneficial is addressed and the optimal checkpointing rate, α*, which minimizes the expected completion time, is computed. The completion time analysis is used as a building block in a queueing model with a Poisson input stream of jobs. This system can be viewed as an M/G/1 queue, in which the first customer starting a busy period is having a different distribution for its service requirement. We derive this distribution and then determine several measures of interest. The checkpointing rate, α**, which minimizes the mean response time is computed. It is numerically shown that α** is very close to α* for a wide range of system parameter values. Finally, sensitivity analysis is performed to study the effect of using non-optimal checkpointing rates on the mean completion time and the mean response time. © 1990, Taylor & Francis Group, LLC. All rights reserved.

Paper