Round compression for parallel matching algorithms
For over a decade now we have been witnessing the success of massive parallel computation frameworks, such as MapReduce, Hadoop, Dryad, or Spark. Compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises however in this context is can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(log n) rounds. Lattanzi et al. [SPAA, ACM, New York, 2011, pp. 85-94] showed that if each machine has n1+\Omega (1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow-up work, seem though to get stuck in a fundamental way at roughly O(log n) rounds once we enter the (at most) near-linear memory regime. In this paper, we break the above O(log n) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (1 + \epsilon )-approximate maximum matching for any fixed constant \epsilon > 0 in O((log log n)2) rounds. To establish our result we need to deviate from the previous work in two important ways. First, we use vertex-based graph partitioning, instead of the edge-based approaches that were utilized so far. Second, we develop a technique of round compression.