Loop-aware SLP in GCC

Ira Rosen; Dorit Nuzman; Ayal Zaks

GCC and GNU Toolchain Developers' Summit 2007

Conference paper

01 Dec 2007

Loop-aware SLP in GCC

Abstract

The GCC auto-vectorizer currently exploits data parallelism only across iterations of innermost loops. Two other important sources of data parallelism are across iterations of outer loops and in straight-line code. We recently embarked upon extending the scope of autovectorization opportunities beyond inner-loop interiteration parallelism, in these two directions. This paper describes the latter effort, which will allow GCC to vectorize unrolled-loops, structure accesses in loops, and important computations like FFT and IDCT (as well as several testcases in missed-optimization PRs). Industry compilers like icc and xlC already support SLP-like vectorization, each in a different way. We want to introduce a new approach for SLP vectorization in loops, that leverages our analysis of adjacent memory references, originally developed for vectorizing strided accesses. We extend our current loop-based vectorization framework to look for parallelism also within a single iteration, yielding a hybrid vectorization framework. This work also opens additional interesting opportunities for enhancing the vectorizer - including partial vectorization (right now it's an "all or nothing" approach), permutations, and MIMD (Multiple Instructions Multiple Data, as in subadd vector operations such as in SSE3 and BlueGene). We will describe how SLP-like vectorization can be incorporated in the current vectorization framework.

Conference paper