About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Transactions on Software Engineering
Paper
A Rigorous Approach to Fault-Tolerant Programming
Abstract
The design of programs that are tolerant of hardware fault occurrences and processor crashes is investigated. Using a stable storage management system as a running example, a new approach is suggested for specifying, understanding, and verifying the correctness of faulttolerant software. The approach extends previously developed axiomatic reasoning methods to the design of fault-tolerant systems by modeling faults as being operations that are performed at random time intervals on any computing system by the system's adverse environment. A clear separation is made between the notions of software correctness and system reliability in the face of hardware malfunction. Fault hypotheses are specified axiomatically and design correctness is proven by using a programming logic extended with fault axioms and rules. Stochastic modeling is employed to verify reliability/availability system properties. The combined correctness and reliability verifications establish that, under given fault and reliability hypotheses, a system behaves according to its functional specifications with a probability greater than that required by its reliability specifications. Copyright © 1985 by The Institute of Electrical and Electronics Engineers, Inc.