About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Conference paper
Estimating system availability and reliability
Abstract
Methods for constructing and solving large Markov chain models of computer system availability and reliability are addressed. A set of powerful high-level modeling constructs is discussed that can be used to represent the failure and repair behavior of the components that constitute a system, including important component interactions. If time-independent failure and repair rates are assumed, then a time-homogeneous continuous-time Markov chain can be constructed automatically from the modeling constructs used to describe the system. Since the size of a Markov chain grows exponentially with the number of components modeled, simulation appears to be a practical way for solving models of large systems. However, the standard simulation requires very long simulation runs to estimate availability and reliability measures because the system failure event is a rare event. Therefore, variance reduction techniques which can aid in computing rare-event probabilities quickly are of interest. The importance sampling technique has been found to be most useful. The modeling language and the simulation methods discussed have been implemented in a program package called the System Availability Estimator (SAVE).
Related
Conference paper
Unassisted true analog neural network training chip
Conference paper