About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ACM Annual Conference 1969
Conference paper
Reliability modeling techniques for self-repairing computer systems
Abstract
This paper develops techniques for generating and using mathematical models applicable to architectural evaluation of the tradeoffs involved in designing self-repairing highly reliable computers for long missions. These systems must use standby sparing and their reliability is shown to be extremely sensitive to small variations in a new design parameter, the coverage, c,defined as the probability of system recovery given the existence of a failure. Interactive terminal calculations show c to be the single most important parameter in high-reliability system design. Changing the coverage from 1 to .98 can result in orders of magnitude change in system mission time with a specified reliability. Most techniques for increasing system reliability (e.g. adding more spares) are shown to be futile in the face of an inadequate .99 coverage. Adding checking, diagnostics, etc. to improve failure coverage is shown to be the most advantageous technique by examples of system tradeoff evaluation. This mandates extensive application of modeling techniques throughout all computer system design phases.