Publication
WPADD 1991
Conference paper

Restoring consistent global states of distributed computations

View publication

Abstract

We present a mechanism for restoring any consistent global state of a distributed computation. This capability can form the basis of support for rollback and replay of computations, an activity we view as essential in a comprehensive environment for debugging distributed programs. Our mechanism records occasional state checkpoints and logs all messages communicated between processes. Our mechanism offers flexibility in the following ways: any consistent global state of the computation can be restored; execution can be replayed either exactly as it occurred initially or with user-controlled variations; there is no need to know a priori what states might be of interest. In addition, if checkpoints and logs are written to stable storage, our mechanism can be used to restore states of computations that cause the system to crash.

Date

Publication

WPADD 1991

Authors

Topics

Share