Publication
IPDPS 2004
Conference paper

Highly efficient synchronization based on active memory operations

Abstract

Synchronization is a crucial operation in many parallel applications. As network latency approaches thousands of processor cycles for large scale multiprocessors, conventional synchronization techniques are failing to keep up with the increasing demand for scalable and efficient synchronization operations. In this paper, we present a mechanism that allows atomic synchronization operations to be executed on the home memory controller of the synchronization variable. By performing atomic operations near where the data resides, our proposed mechanism can significantly reduce the number of network messages required by synchronization operations. Our proposed design also enhances performance by using fine-grained updates to selectively "push" the results of offloaded synchronization operations back to processors when they complete (e.g., when a barrier count reaches the desired value). We use the proposed mechanism to optimize two of the most widely used synchronization operations, barriers and spin locks. Our simulation results show that the proposed mechanism outperforms conventional implementations based on load-linked/store-conditional, processor-centric atomic instructions, conventional memory-side atomic instructions, or active messages. It speeds up conventional barriers by up to 2.1 (4 processors) to 61.9 (256processors) and spin locks by a factor of up to 2.0 (4 processors) to 10.4 (256 processors).

Date

Publication

IPDPS 2004

Authors

Share