A Rigorous Approach to Fault-Tolerant Programming

Flaviu Cristian

doi:10.1109/TSE.1985.231534

IEEE Transactions on Software Engineering

Paper

01 Jan 1985

A Rigorous Approach to Fault-Tolerant Programming

View publication

Abstract

The design of programs that are tolerant of hardware fault occurrences and processor crashes is investigated. Using a stable storage management system as a running example, a new approach is suggested for specifying, understanding, and verifying the correctness of faulttolerant software. The approach extends previously developed axiomatic reasoning methods to the design of fault-tolerant systems by modeling faults as being operations that are performed at random time intervals on any computing system by the system's adverse environment. A clear separation is made between the notions of software correctness and system reliability in the face of hardware malfunction. Fault hypotheses are specified axiomatically and design correctness is proven by using a programming logic extended with fault axioms and rules. Stochastic modeling is employed to verify reliability/availability system properties. The combined correctness and reliability verifications establish that, under given fault and reliability hypotheses, a system behaves according to its functional specifications with a probability greater than that required by its reliability specifications. Copyright © 1985 by The Institute of Electrical and Electronics Engineers, Inc.

Paper