GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronism
This paper presents new advances in GPU-driven Fourier pseudo-spectral numerical algorithms, which allow the simulation of turbulent fluid flow at problem sizes beyond the current state of the art. In contrast to several massively parallel petascale systems, the dense nodes of Summit, Sierra, and expected exascale machines can be exploited with coarser MPI decompositions which result in improved MPI all-to-all scaling. An asynchronous batching strategy, combined with the fast hardware connection between the large CPU memory and the fast GPUs allows effective use of the GPUs on problem sizes which are too large to reside in GPU memory. Communication performance is further improved by a hybrid MPI+OpenMP approach. Favorable performance is obtained up to a 184323 problem size on 3072 nodes of Summit, with a GPU to CPU speedup of 4.7 for a 122883 problem size (the largest problem size previously published in turbulence literature).