Enforcing the orthogonality of approximate wavefunctions becomes one of the dominant computational kernels in planewave based Density Functional Theory electronic structure calculations that involve thousands of atoms. In this context, algorithms that enjoy both excellent scalability and single processor performance properties are much needed. In this paper we present block versions of the Gram-Schmidt method and we show that they are excellent candidates for our purposes. We compare the new approach with the state of the art practice in planewave based calculations and find that it has much to offer, especially when applied on massively parallel supercomputers such as the IBM Blue Gene/P Supercomputer. The new method achieves excellent sustained performance that surpasses 73 TFLOPS (67% of peak) on 8 Blue Gene/P1 racks (32 768 compute cores), while it enables more than a two fold decrease in run time when compared with the best competing methodology. © 2010 Elsevier B.V. All rights reserved.