Guojing Cong, David A. Bader
Journal of Parallel and Distributed Computing
UPC is designed to improve user productivity when programming distributed-memory machines. Yet the shared-memory abstraction also makes performance analysis hard as it introduces extra overhead with local accesses and implicit communication with remote ones. As far as we know, there are no mature software utilities for systematic analysis and tuning of shared-memory access performance in UPC programs. We develop a mechanism to track shared memory accesses and correlate them to the UPC source lines, functions, and data structures. We then apply tool-assisted analysis to a set of UPC programs. For the NAS UPC benchmark we achieve dramatic performance improvement over the unoptimized implementation as well as up to two times speedups over the fully hand-tuned implementation. We expect our approach effective in tuning a wide range of UPC programs. © 2012 IEEE.
Guojing Cong, David A. Bader
Journal of Parallel and Distributed Computing
Guojing Cong, Konstantin Makarychev
IPDPS 2011
Fan Zhou, Guojing Cong
IJCAI 2018
Guojing Cong, Hanhong Xue
IPDPS 2008