About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
WPADD 1991
Conference paper
Restoring consistent global states of distributed computations
Abstract
We present a mechanism for restoring any consistent global state of a distributed computation. This capability can form the basis of support for rollback and replay of computations, an activity we view as essential in a comprehensive environment for debugging distributed programs. Our mechanism records occasional state checkpoints and logs all messages communicated between processes. Our mechanism offers flexibility in the following ways: any consistent global state of the computation can be restored; execution can be replayed either exactly as it occurred initially or with user-controlled variations; there is no need to know a priori what states might be of interest. In addition, if checkpoints and logs are written to stable storage, our mechanism can be used to restore states of computations that cause the system to crash.