Architecting malleable MPI applications for priority-driven adaptive scheduling
Future supercomputers will need to support both traditional HPC applications and Big Data/High Performance Analysis applications seamlessly in a common environment. This motivates traditional job scheduling systems to support malleable jobs along with allocations that can dynamically change in size, in order to adapt the amount of resources to the actual current need of the different applications. It also calls for future innovative HPC applications to adapt to this environment, and provide some level of malleability for releasing underutilized resources to other tasks. In this paper, we present and compare two different methodologies to support such malleable MPI applications: 1)using checkpoint/ restart and the SCR library, and 2) using dynamic data redistribution and the ULFM API and runtime. We examine their effects on application execution times as well as their impact on resource management.