Abstract
Operating system noise is a well-known problem that may limit application scalability on large-scale machines, significantly reducing their performance. Though the problem is well studied, much of the previous work has been qualitative. We have developed a technique to provide a quantitative descriptive analysis for each OS event that contributes to OS noise. The mechanism allows us to detail all sources of OS noise through precise kernel instrumentation and provides frequency and duration analysis for each event. Such a description gives OS developers better guidance for reducing OS noise. We integrated this data with a trace visualizer allowing quicker and more intuitive understanding of the data. Specifically, the contributions of this paper are three-fold. First, we describe a methodology whereby detailed quantitative information may be obtained for each OS noise event. Though not the thrust of the paper, we show how we implemented that methodology by augmenting LTTng. We validate our approach by comparing it to other well-known standard techniques to analyze OS noise. Second, we provide a case study in which we use our methodology to analyze the OS noise when running benchmarks from the LLNL Sequoia applications. Our experiments enrich and expand previous results with our quantitative characterization. Third, we describe how a detailed characterization permits to disambiguate noise signatures of qualitatively similar events, allowing developers to address the true cause of each noise event. © 2011 IEEE.