The cache complexity of multithreaded cache oblivious algorithms
Abstract
We present a technique for analyzing the number of cache misses incurred by multithreaded cache oblivious algorithms on an idealized parallel machine in which each processor has a private cache. We specialize this technique to computations executed by the Cilk work-stealing scheduler on a machine with dag-consistent shared memory. We show that a multithreaded cache oblivious matrix multiplication incurs O(n3√Z+(Pn)1/3n 2) cache misses when executed by the Cilk scheduler on a machine with P processors, each with a cache of size Z, with high probability. This bound is tighter than previously published bounds. We also present a new multithreaded cache oblivious algorithm for 1D stencil computations incurring O(n 2/Z+n+√ Pn3+ε) cache misses with high probability, one for Gaussian elimination and back substitution, and one for the length computation part of the longest common subsequence problem incurring O(n2/Z+√ Pn3.58) cache misses with high probability. © 2007 Springer Science+Business Media, LLC.