In this paper we present and evaluate a parallel community detection algorithm derived from the state-of-the-artLouvain modularity maximization method. Our algorithm adoptsa novel graph mapping and data representation, and relies onan efficient communication runtime, specifically designed forfine-grained applications executed on large-scale supercomputers. We have been able to parallelize graphs with up to 138 billion edges on 8, 192 Blue Gene/Q nodes and 1, 024 P7-IH nodes. Leveraging the convergence properties of our algorithm and the efficient implementation, we can analyze communities of large scalegraphs in just a few seconds. To the best of our knowledge, this is the first parallel implementation of the Louvain algorithm that scales to these large data and processor configurations.