Graph500 is a new benchmark to rank supercomputers with a large-scale graph search problem. We found that the provided reference implementations are not scalable in a large distributed environment. In this paper we implement an optimized method based on 2D based partitioning. Our implementation can solve BFS (Breadth First Search) of large-scale graph with 68.7 billion vertices and 1.1 trillion edges for 24.12 seconds with 512 nodes and 12288 CPU cores. This record corresponds to 22.8 GE/s. We found 2D based partitioning method is scalable for large-scale distributed systems. We also demonstrate thorough study of performance character is-tics of our optimized implementation and reference implementations in a large-scale distributed memory super-computer with a Fat-Tree based Infiniband network. © 2012 IEEE.