IPDPS 2012
Conference paper

Evaluating the impact of TLB misses on future HPC systems

View publication


TLB misses have been considered an important source of system overhead and one of the causes that limit scalability on large supercomputers. This assumption lead to HPC lightweight kernel designs that usually statically map page table entries to TLB entries and do not take TLB misses. While this approach worked for petascale clusters, programming and debugging exascale applications composed of billions of threads is not a trivial task and users have started to explore novel programming models and tools, which require a richer system software support. In this study we present a quantitative analysis of the effect of TLB misses on current and future parallel applications at scale. To provide a fair evaluation, we compare a noiseless OS (CNK) with a custom version of the same OS capable of handling TLB misses on a BG/P system (up to 4096 cores). Our methodology follows a two-step approach: we first analyze the effects of TLB misses with a low-overhead, range-checking TLB miss handler, and then simulate a more complex TLB management system through TLB noise injection. We analyze the system behavior with different page sizes and increasing number of nodes and perform a sensitivity analysis. Our results show that the overhead introduced by TLB misses on complex HPC applications from the LLNL and ANL benchmarks is below 2% if the TLB pressure is contained and/or the TLB miss handler overhead is low, even with 1MB-pages and under large TLB noise injection. These results open the possibility of implementing richer OS memory management services to satisfy the requirements of future applications and users. © 2012 IEEE.