Walking randomly, massively, and efficiently
We introduce a set of techniques that allow for efficiently generating many independent random walks in the Massively Parallel Computation (MPC) model with space per machine strongly sublinear in the number of vertices. In this space-per-machine regime, many natural approaches to graph problems struggle to overcome the (log n) MPC round complexity barrier, where n is the number of vertices. Our techniques enable achieving this for PageRank - one of the most important applications of random walks - even in more challenging directed graphs, as well as for approximate bipartiteness and expansion testing. In the undirected case, we start our random walks from the stationary distribution, which implies that we approximately know the empirical distribution of their next steps. This allows for preparing continuations of random walks in advance and applying a doubling approach. As a result we can generate multiple random walks of length l in (log l) rounds on MPC. Moreover, we show that under the popular 1-vs.-2-Cycles conjecture, this round complexity is asymptotically tight. For directed graphs, our approach stems from our treatment of the PageRank Markov chain. We first compute the PageRank for the undirected version of the input graph and then slowly transition towards the directed case, considering convex combinations of the transition matrices in the process. For PageRank, we achieve the following round complexities for damping factor equal to 1 - ": in O(log log n + log 1 / ") rounds for undirected graphs (with Õ(m / "2) total space), in Õ(log2 log n + log2 1/") rounds for directed graphs (with Õ((m+n1+o(1)) / poly(")) total space). The round complexity of our result for computing PageRank has only logarithmic dependence on 1/". We use this to show that our PageRank algorithm can be used to construct directed length-l random walks in O(log2 log n + log2 l) rounds with Õ((m+n1+o(1)) poly(l)) total space. More specifically, by setting " = (1 / l), a length-l PageRank walk with constant probability contains no random jump, and hence is a directed random walk.