Memory centric computation (Mc2) for large-scale graph processing
Large-scale graph processing is an increasingly important workload in modern systems. Conventional systems are usually optimized for locality of memory references, using caches and parallelization techniques to cover long memory latencies. However since graphs are distributed over memory in unpredictable manner, their processing does not exhibit great locality. While graph algorithms have plenty of parallelism, they are not easily amenable for effective vectorization, as the memory references are scattered all over. What is needed is a paradigm to specify a number of parallel tasks, each of which performs a short computation near the memory and a mechanism to efficiently execute them. In this paper, we propose a novel computational model that is memory-centric: the computation is organized as a collection of functions, each of which operates on a specific piece of data and is executed close to the memory where it resides. Basic primitives are provided to orchestrate the flow, synchronization and execution of the functions at their respective data points to accomplish a global task. We propose a scalable architecture to execute this computational model. We simulate an implementation of this architecture to compare the performance of running some graph algorithms on it with observed performance when the same algorithms were run on conventional systems. Preliminary results for a few graph algorithms show our approach is very promising in improving the performance of graph algorithms.