Workload performance characterization of DARPA HPCS benchmarks
Abstract
It is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of hardware and software stacks of the future machines. In this paper, we analyze the workload performance characteristics of three large-scale DARPA HPCS benchmarks: HYCOM, POP, and LBMHD while executing on IBM Power5+ processor machines. Our analysis is focused on CPU/memory performance using Cycles Per Instruction (CPI) model and multiprocess communication performance using MPI traces. For each benchmark, we provide a high level performance analysis followed by the hot-spot analysis of codes for selected input parameters. Then we present a detailed workload performance characterization using CPI model with data from a unique set of performance counters available on the Power5+ processor system. For communication, we describe the sources of load imbalances in the applications and identify the potential impediments to scalability of the applications under large processor counts. We identify several sources of performance problems that are potential bottlenecks and discuss methods to ameliorate them. © 2008 IEEE.