About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Transactions on Architecture and Code Optimization
Paper
Efficient Address Remapping in Distributed Shared-Memory Systems
Abstract
As processor performance continues to improve at a rate much higher than DRAM and network performance, we are approaching a time when large-scale distributed shared memory systems will have remote memory latencies measured in tens of thousands of processor cycles. The Impulse memory system architecture adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to control how data is accessed and cached and thereby improve cache and bus utilization and reduce the number of memory accesses required. Previous Impulse work focuses on uniprocessor systems and relies on software to flush processor caches when necessary to ensure data coherence. In this paper, we investigate an extension of Impulse to multiprocessor systems that extends the coherence protocol to maintain data coherence without requiring software-directed cache flushing. Specifically, the multiprocessor Impulse controller can gather/scatter data across the network while its coherence protocol guarantees that each gather request gets coherent data and each scatter request updates every coherent replica in the system. Our simulation results demonstrate that the proposed system can significantly outperform conventional systems, achieving an average speedup of 9X on four memory-bound benchmarks on a 32-processor system. © 2006, ACM. All rights reserved.