Landrush: Rethinking In-Situ Analysis for GPGPU Workflows

Anshuman Goswami; Yuan Tian; Karsten Schwan; Fang Zheng; Jeffrey Young; Matthew Wolf; Greg Eisenhauer; Scott Klasky

doi:10.1109/CCGrid.2016.58

CCGrid 2016

Conference paper

18 Jul 2016

Landrush: Rethinking In-Situ Analysis for GPGPU Workflows

View publication

Abstract

In-situ analysis on the output data of scientific simulations has been made necessary by ever-growing output data volumes and increasing costs of data movement as supercomputing is moving towards exascale. With hardware accelerators like GPUs becoming increasingly common in high end machines, new opportunities arise to co-locate scientific simulations and online analysis performed on the scientific data generated by the simulations. However, the asynchronous nature of GPGPU programming models and the limited context-switching capabilities on the GPU pose challenges to co-locating the scientific simulation and analysis on the same GPU. This paper dives deeper into these challenges to understand how best to co-locate analysis with scientific simulations on the GPUs in HPC clusters. Specifically, our 'Landrush' approach to GPU sharing proposes a solution that utilizes idle cycles on the GPU to provide an improved time-to-answer, that is, the total time to run the scientific simulation and analysis of the generated data. Landrush is demonstrated with experimental results obtained from leadership high-end applications on ORNL's Titan supercomputer, which show that (i) GPU-based scientific simulations have varying degrees of idle cycles to afford useful analysis task co-location, and (ii) the inability to context switch on the GPU at instruction granularity can be overcome by careful control of the analysis kernel launches and software-controlled early completion of analysis kernel executions. Results show that Landrush is superior in terms of time-to-answer compared to serially running simulations followed by analysis or by relying on the GPU driver and hardwired thread dispatcher to run analysis concurrently on a single GPU.

Conference paper