About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
DSN 2023
Conference paper
Characterization and exploration of latch checkers for efficient RAS protection
Abstract
Reliability has been, and continues to be a key consideration in the design of the IBM Z mainframe processors, and has resulted in industry-leading performance with little-to-no downtime. In this paper, we analyze the various hardware reliability mechanisms that make the processor resilient to transient errors, and the checker architecture that enables their detection and correction. We characterize the error checking logic in the processor based on a detailed analysis of the actual design. Based on hardware measurements on a real Z processor, we then determine the error checkers that are critical from a timing standpoint, in the event where the supply voltage is scaled. We propose algorithms that optimize checker selection without affecting the RAS coverage and the detection of errors induced both due to SER and voltage scaling. Finally we examine further potential optimizations of checkers based on the logic utilization in representative benchmarks.