About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICS 2014
Conference paper
A NUCA substrate for flexible CMP cache sharing
Abstract
We propose an organization for the on-chip memory sys- tem of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is or- ganized as a non-uniform cache architecture (NUCA) array with a switched network embedded in it for high perfor- mance. We show that this organization can support the spectrum of degrees of sharing: unshared, in which each processor has a private portion of the cache, thus reduc- ing hit latency, completely shared, in which every processor shares the entire cache, thus minimizing misses, and every point in between. We find the optimal degree of sharing for a number of cache bank mapping policies, and also evaluate a per-application cache partitioning strategy. We conclude that a static NUCA organization with sharing degrees of two or four work best across a suite of commercial and scientific parallel workloads. We also demonstrate that migratory, dy- namic NUCA approaches improve performance significantly for a subset of the workloads at the cost of increased power consumption and complexity, especially as per-application cache partitioning strategies are applied. Copyright