Optimization of message passing services on POWER8 infiniband clusters
We present scalability and performance enhancements to MPI libraries on POWER8 InfiniBand clusters. We explore optimizations in the Parallel Active Messaging Interface (PAMI) libraries. We bypass IB VERBS via low level inline calls resulting in low latencies and high message rates. MPI is enabled on POWER8 by extension of both MPICH and Open MPI to call PAMI libraries. The IBM POWER8 nodes have GPU accelerators to optimize floating throughput of the node. We explore optimized algorithms for GPU-to-GPU communication with minimal processor involvement. We achieve a peak MPI message rate of 186 million messages per second. We also present scalable performance in the QBOX and AMG applications.