VLIW compilation techniques in a superscalar environment

Kemal Ebcioglu; Randy D. Groves; Ki-Chang Kim; G.M. Silberman; Isaac Ziv

PLDI 1994

Conference paper

20 Jun 1994

VLIW compilation techniques in a superscalar environment

Abstract

We describe techniques for converting the intermediate code representation of a given program, as generated by a modern compiler, to another representation which produces the same run-time results, but can run faster on a superscalar machine. The algorithms, based on novel parallelization techniques for Very Long Instruction Word (VLIW) architectures, find and place together independently executable operations that may be far apart in the original code, i.e., they may be separated by many conditional branches or belong to different iterations of a loop. As a result, the functional units in the superscalar are presented with more work that can proceed in parallel, thus achieving higher performance than the approach of using hardware instruction dispatch techniques alone. While general scheduling techniques improve performance by removing idle pipeline cycles, to further improve performance on a superscalar with only a few functional units requires a reduction in the pathlength. We have designed a set of new algorithms for reducing pathlength and removing stalls due to branches, namely speculative load-store motion out of loops, unspeculation, limited combining, basic block expansion, and prolog tailoring. These algorithms were implemented in a prototype version of the IBM RS/6000 xlc compiler and have shown significant improvement in SPEC integer benchmarks on the IBM POWER machines.

Paper