Exploring network optimizations for large-scale graph analytics

Xinyu Que; Fabio Checconi; Fabrizio Petrini; Xing Liu; Daniele Buono

doi:10.1145/2807591.2807661

SC 2015

Conference paper

15 Nov 2015

Exploring network optimizations for large-scale graph analytics

View publication

Abstract

Graph analytics are arguably one of the most demanding workloads for high-performance systems and interconnection networks. Graph applications often display all-to-all, fine-grained, high-rate communication patterns that expose the limits of the network protocol stacks. Load and communication imbalance generate hard-to-predict network hot-spots, and may require computational steering due to unpredictable data distributions. In this paper we present a lightweight communication library, implemented "on the metal" of BlueGene/Q and POWER7 IH that we have used to support large-scale graph algorithms up to 96K processing nodes and 6 million threads. With this library we have explored several optimization techniques, including overlapped communication, non-blocking collectives, message aggregation, and computation in the network for special collective communication patterns, such as parallel prefix. The experimental results show significant performance improvements, ranging from 5X to 10X, when compared to equally optimized MPI implementations.

Paper