Asymmetric communication models for resource-constrained hierarchical ethernet networks
Abstract
SummaryCommunication time prediction is critical for parallel application performance tuning, especially for the rapidly growing field of data-intensive applications. However, making such predictions accurately is non-trivial when contention exists on different components in hierarchical networks. In this article, we derive an 'asymmetric network property' on transmission control protocol (TCP) layer for concurrent bidirectional communications in a commercial off-the-shelf (COTS) cluster and develop a communication model as the first effort to characterize the communication times on hierarchical Ethernet networks with contentions on both network interface card and backbone cable levels. We develop a micro-benchmark for a set of simultaneous point-to-point message-passing interface (MPI) operations on a parametrized network topology and use it to validate our model extensively and show that the model can be used to predict the communication times for simultaneous MPI operations (both point-to-point and collective communications) on resource-constrained networks effectively. We show that if the asymmetric network property is excluded from the model, the communication time predictions will be significantly less accurate than those made by using the asymmetric network property. In addition, we validate the model on a cluster of Grid5000 infrastructure, which is a more loosely coupled platform. As such, we advocate the potential to integrate this model in performance analysis for data-intensive parallel applications. Our observation of the performance degradation caused by the asymmetric network property suggests that some part of the software stack below TCP layer in COTS clusters needs targeted tuning, which has not yet attracted any attention in literature.