Pol G. Recasens, Yue Zhu, et al.
EuroSys 2024
We give an optimal algorithm for broadcasting on SIMD hypercubes without the hardware support for indirect addressing; i.e., an indirect memory reference takes as many cycles as the number of different memory locations (offsets) accessed by all active processors. With communication restricted to one dimension at a time, our algorithm broadcasts m elements in m + n - 1 communication steps and 2m direct memory references in an n-cube. The well-known algorithm based on the spanning binomial tree needs mn communication steps and 2m direct memory references. A previous algorithm based on the n edge-disjoint spanning trees takes m + n - 1 communication steps but needs at least 2 mn direct memory references. We then give a reduction algorithm on SIMD hypercubes, which is optimal in terms of the communication times and memory references and nearly optimal (about a factor of n (n - 1) when m ≫ n) with respect to arithmetic time. We also generalize our algorithms to the case in which communication is restricted to k dimensions at a time, for any integer k in the range of 1 {slanted equal to or less-than} k ≤ n, with their optimality preserved. © 1991.
Pol G. Recasens, Yue Zhu, et al.
EuroSys 2024
Cynthia Dwork, Moni Naor, et al.
Journal of the ACM
Gaku Yamamoto, Hideki Tai, et al.
AAMAS 2008
Bing Zhang, Mikio Takeuchi, et al.
ICAIF 2024