The BlueGene/L supercomputer and quantum ChromoDynamics
Pavlos Vranas, Gyan Bhanot, et al.
ACM/IEEE SC 2006
MPI is a popular programming paradigm on parallel machines today. MPI libraries sometimes use O(N) data structures to implement MPI functionality. The IBM Blue Gene/Q machine has 16 GB memory per node. If each node runs 32 MPI processes, only 512 MB is available per process, requiring the MPI library to be space efficient. This scenario will become severe in a future Exascale machine with tens of millions of cores and MPI endpoints. We explore techniques to compress the dense O(N) mapping data structures that map the logical process ID to the global rank. Our techniques minimize topological communicator mapping state by replacing table lookups with a mapping function. We also explore caching schemes with performance results to optimize overheads of the mapping functions for recent translations in multiple MPI micro-benchmarks, and the 3D FFT and Algebraic Multi Grid application benchmarks.
Pavlos Vranas, Gyan Bhanot, et al.
ACM/IEEE SC 2006
Preeti Malakar, Thomas George, et al.
SC 2012
Sameer Kumar, Chao Huang, et al.
IBM J. Res. Dev
Pavlos Vranas, Matthias A. Blumrich, et al.
IBM J. Res. Dev