Proceedings - International Symposium on High-Performance Computer Architecture 2000
Conference paper

High-throughput coherence controllers


Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. In this paper we study three approaches to alleviating this problem in hardwired coherence controllers, namely, multiple protocol engines, pipelined protocol engines, and split request-response streams. Split request-response streams is an innovative contribution of this paper. The performance of pipelining in the context of coherence controllers has not been presented in the literature. Multiple protocol engines has not been studied in the context of hardwired controllers except for a study of ours and only to a limited extent. Using both commercial and scientific benchmarks on detailed simulation models, we present experimental results that show that each mechanism is highly effective at reducing controller occupancy by as much as 66% and improving execution time by as much as 51%, for applications with high communication bandwidth requirement. A combination of mechanisms further reduces controller occupancy and execution time by as much as 78% and 61%, respectively. Our results show that applying any of the parallel mechanisms in the coherence controllers allows integrating four times as many processors per coherence controller, thus reducing system cost, while maintaining or even exceeding the performance of systems with larger number of coherence controllers.