Custom data layout for memory parallelism

Byoungro So; Mary W. Hall; Heidi E. Ziegler

CGO 2004

Conference paper

12 Jul 2004

Custom data layout for memory parallelism

Abstract

In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel memory accesses in modem architectures where multiple memory banks can simultaneously feed one or more functional units. We do not use a fixed data layout, but rather select application-specific layouts according to access patterns in the code. A unique feature of this approach is its flexibility in the presence of code reordering transformations, such as the loop nest transformations commonly applied to array-based computations. We have implemented this algorithm in the DEFACTO system, a design environment for automatically mapping C programs to hardware implementations for FPGA-based systems. We present experimental results for five multimedia kernels that demonstrate the benefits of this approach. Our results show that custom data layout yields results as good as, or better than, naive or fixed cyclic layouts, and is significantly better for certain access patterns and in the presence of code reordering transformations. When used in conjunction with unrolling loops in a nest to expose instruction-level parallelism, we observe greater than a 75% reduction in the number of memory access cycles and speedups ranging from 3.96 to 46.7 for 8 memories, as compared to using a single memory with no unrolling.

Conference paper

Optimizing compiler for a CELL processor

Alexandre E. Eichenberger, Kathryn O'Brien, et al.

PACT 2005

Paper

Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture

Alexandre E. Eichenberger, John Kevin O'Brien, et al.

IBM Systems Journal

View all publications

Abstract

Related

Optimizing compiler for a CELL processor

Using advanced compiler technology to exploit the performance of the Cell Broadband Engine™ architecture