Algebraic Multigrid (AMG) solvers find wide use in scientific simulation codes. Their ideal computational complexity makes them especially attractive for solving large problems on parallel machines. However, they also involve a substantial amount of data movement, posing challenges to performance and scalability. In this paper, we present an algorithm that provides a systematic means of reducing data movement in AMG. The algorithm operates by gathering and redistributing the problem data to reduce the need to move it on the communication-intensive coarse grid portion of AMG. The data is gathered in a way that ensures data locality by keeping data movement confined to specific regions of the machine. Any decision to gather data is made systematically through the means of a performance model. This approach results in substantial speedups on a multicore cluster when using AMG to solve a variety of test problems. © 2013 IEEE.