Publication
HOTI 2010
Conference paper

Impact of inter-application contention in current and future HPC systems

View publication

Abstract

Fat-tree networks are the most popular topology among indirect networks in today's supercomputers. Current supercomputers are generally operated in a shared environment under the control of a job scheduler, executing many parallel applications simultaneously. The competition between these applications to use the same network resources causes a degradation in the applications' performance. The application that has to wait for the network resources occupied by another application's messages is said to be experiencing inter-application contention. The extent of degradation caused by inter-application contention is known to depend on multiple factors: the network topology, the routing scheme, the task-placement, etc. Note that these factors also affect intra-application contention. Our work evaluates the impact of inter-application contention for actual competing HPC workloads under different routing schemes in slimmed fat trees. In contrast with previous works, which focus mostly on individual application's performance, we take a more system-centric view. Our work estimates the amount of system performance loss that inter-application contention contributes in current HPC systems, which we have measured to be around a 10%. We also present a projection of the impact of inter-application contention in the near and mid-term future HPC systems, scaling the node computational power and network link speeds to foreseeable values. Our projection for future HPC systems shows that inter-application contention can cause a 15% throughput loss even with link speeds of 40 Gb/s for some application mixes. The difference in impact on a chosen application when running within different mixes leads to the performance variability described in previous works, but our work sets a better bound on the variability than studies performed with an injection of network noise. Finally, we found a high correlation between the communication volume of the applications in a workload and the amount of inter-application contention they experience. © 2010 IEEE.

Date

18 Aug 2010

Publication

HOTI 2010

Authors

Share