The importance and characteristics of communication in high performance data analytics

Andreea Anghel; German Rodriguez; Bogdan Prisacari

doi:10.1109/IISWC.2014.6983044

IISWC 2014

Conference paper

11 Dec 2014

The importance and characteristics of communication in high performance data analytics

View publication

Abstract

Social networks and business analytics typically need to process vast amounts of data that are often modeled as graphs. The scale of the data that such applications have to handle requires large-scale distributed computing systems, together with scalable parallel algorithms, to efficiently process the graphs. Representative of the graph-based analytics class of applications is the Graph 500 benchmark [1], which is designed to assess the performance of supercomputing systems by solving the Breadth-First Search (BFS) graph traversal problem. In this work, we analyze the network data motion of a Graph 500 MPI version of the graph traversal problem, using a large-scale high-performance computing system, i.e., the MareNostrum III supercomputer [2]. We focus our analysis on the node-to-node communication and show that the application runtime is communication-bound, the communication making up as much as 80% of the execution time of each BFS iteration. We also show that the dominating communication pattern is an overall all-to-all exchange (every process communicates to every other process and roughly the same amount of data is exchanged between any two processes), thus providing preliminary guidance for future application or network design optimization efforts.

Conference paper