Optimal communication channel utilization for matrix transposition and related permutations on binary cubes
Abstract
We present optimal schedules for permutations in which each node sends one or several unique messages to every other node. With concurrent communication on all channels of every node in binary cube networks, the number of element transfers in sequence for K elements per node is K 2, irrespective of the number of nodes over which the data set is distributed. For a succession of s permutations within disjoint subcubes of d dimensions each, our schedules yield min( K 2+(s-1)d,(s+3)d, K 2+2d) exchanges in sequence. The algorithms can be organized to avoid indirect addressing in the internode data exchanges, a property that increases the performance on some architectures. For message-passing communication libraries, we present a blocking procedure that minimizes the number of block transfers while preserving the utilization of the communication channels. For schedules with optimal channel utilization, the number of block transfers for a binary d-cube is d. The maximum block size for K elements per node is ⌈ K (2d)⌉. © 1994.