A fast, hybrid, power-efficient high-precision solver for large linear systems based on low-precision hardware

View publication


In recent years, the amount of data produced has been exploding at a rate far greater than the increase in computing power of even large supercomputers. As a result, modern computer systems are unable to analyze all the available data – a situation that will become even worse in the foreseeable future. We follow an approach to data analytics where the computational complexity is fundamentally reduced by performing the majority of the computation in an approximated or even stochastic framework while the high precision solution is guaranteed by an iterative refinement process. This paper presents a parallel heterogeneous system implementing a mixed-precision iterative refinement solver for large linear systems, which is a building block for many other complex algorithms. In our solver, the backward step is implemented as a novel variant of the conjugate gradient (CG) method running on an FPGA using fixed point data types. The low precision of the backward step is compensated for by the forward step running in high precision on a GPU, which iteratively updates the current solution until a given working precision has been reached. We have implemented our CG solver using Altera's OpenCL SDK for FPGAs and use NVIDIA's CUBLAS library for the forward step on the GPU. Through the combination of GPU and FPGA we were able to achieve a speedup of 3.7× for large dense 24,064 × 24,064 matrices and require 3.5× less energy per solved right-hand side compared to a tuned multi-threaded CPU solver based on the ATLAS linear algebra library.