About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Design and Test of Computers
Paper
Reliability challenges and system performance at the architecture level
Abstract
The reliability challenges and the system performance at the architecture level are discussed. In modern computer systems, power and energy are the primary design constraints that has increased the use of inline concurrent error detection (CED) techniques, both hardware and software to achieve comparable reliability to that of modular redundancy. Modern processors use a variety of CED techniques and parity checking for detection of data path errors as well as for some register files. The introduction of hybrid systems with accelerators, along with widely used commodity off-the-shelf (COTS) low-power components enables the software-level techniques to provide efficient reliability. In memory subsystems, error detection can be performed through parity checking or error correcting codes (ECC). Memory scrubbing corrects single-bit errors in the background while memory is idle, thus preventing multibit errors. Architectural techniques and mechanisms must be incorporated in the design process, for both ease of design and cost reduction in building robust systems.