Workload performance characterization of DARPA HPCS benchmarks
Abstract
It is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of hardware and software stacks of future machines. In this article, we analyze the workload performance characteristics of three large-scale DARPA HPCS benchmarks: Hybrid Coordinate Ocean Model, Parallel Ocean Program, and Lattice Boltzemann Magneto- Hydrodynamics Code while executing on IBM Power5+ processor machines. Our analysis is focused on the CPU/memory performance using Cycles Per Instruction (CPI) model and multiprocess communication performance using MPI traces. For each benchmark, we provide a high-level performance analysis followed by the hotspot analysis for selected input parameters. Then we present a detailed workload performance characterization using CPI model with data from a unique set of performance counters available on the Power5+ processor system. From communication performance analysis, we describe the sources of load imbalances in the applications and identify the potential impediments to the scalability of the applications under large processor counts. We identify several sources of performance problems that are potential bottlenecks and discuss methods to ameliorate them. We also present a comparative analysis of these benchmarks to summarize the similarities and differences in their performance characteristics. Copyright © 2009 John Wiley & Sons, Ltd.