About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SBAC-PAD 2018
Conference paper
Towards a Single-Host Many-GPU System
Abstract
As computation-intensive tasks such as deep learning and big data analysis take advantage of GPU based accelerators, the interconnection links may become a bottleneck. In this paper, we investigate the upcoming performance bottleneck of multi-accelerator systems, as the number of accelerators equipped with single host grows. We instrumented the host PCIe fabric to measure the data transfer and compared it with the measurements from the software tool. It shows how the data transfer (P2P) helps to avoid the bottleneck on the interconnection links, but multi-GPU performance does not scale up as expected due to the control messages. We quantify the impact of host control messages with suggestions to remedy scalability bottlenecks. We also implement the proposed strategy on Lulesh to validate the concept. The result shows our strategy can save 59.86% time cost of the kernel and 13.32% PCIe H2D payload.