Hierarchical redesign of classic MPI reduction algorithms

Khalid Hasanov; Alexey Lastovetsky

doi:10.1007/s11227-016-1779-7

Journal of Supercomputing

Paper

18 Jun 2016

Hierarchical redesign of classic MPI reduction algorithms

Download paper

Abstract

Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecture-specific collective algorithms have been proposed and implemented in the state-of-the-art MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA, and the state-of-the-art MPI broadcast algorithms, demonstrating significant multifold performance gains, especially for large-scale HPC systems. In this paper, we apply this approach to optimization of the MPI Reduce and Allreduce operations. Theoretical analysis and experimental results on a cluster of Grid’5000 platform are presented.

Conference paper