Publication
INA-OCMC 2012
Workshop paper

Contention-aware node allocation policy for high-performance capacity systems

View publication

Abstract

Inter-application network contention is seen as a major hurdle to achieve higher throughput in today's large-scale high-performance capacity systems. This effect is aggravated by current system schedulers that allocate jobs as soon as nodes become available, thus producing job fragmentation, i.e., the tasks of one job might be spread throughout the system instead of being allocated contiguously. This fragmentation increases the probability of sharing network resources with other applications, which produces higher inter-application network contention. In this paper, we propose the use of a contention-aware node allocation technique. This technique is based on identifying which applications are most prone to causing a big impact on inter-application contention and obtaining a more contiguous allocation for these particular workloads. We demonstrate that, although a contiguous node allocation on slimmed fat-tree topologies may increase intra-application contention, the reduction on inter-application contention is more significant. Simulation experiments on a 2,048-node system running multiple applications showed that this technique reduces contention time by up to 35%. © 2012 ACM.