About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SRDS 1991
Conference paper
A timestamp-based checkpointing protocol for long-lived distributed computations
Abstract
The authors present a timestamp-based protocol for checkpointing the global state of a long-lived distributed computation in an environment in which processor clocks are approximately synchronized. The protocol is based on periodic checkpointing of local process states and logging of incoming messages during a short bounded interval. It tolerates process crash and performance failures as well as network omission and performance failures. The proposed approach has the advantage of optimistic logging protocols in that it does not require synchronous logging of each message on stable storage. The approach also has the advantage of pessimistic logging protocols in that it avoids the domino effect by recovering to the most recent successful local checkpoint.