Programming Reconfigurable Heterogeneous Computing Clusters Using MPI With Transpilation
With the slowdown of Moore's law and the stop of Dennard scaling, energy efficiency of compute hardware translates to compute power. Therefore, High-Performance Computing (HPC) systems tend to rely more and more on accelerators such as Field-Programmable Gate Arrays (FPGAs) to fuel high demanding workloads, like Big Data applications or Deep Neuronal Networks. These FPGAs are reconfigurable and sometimes no longer bus-attached to a CPU but directly connected to the data center network fabric as standalone nodes. This mix of CPUs and FPGAs leads to the creation of Reconfigurable Heterogeneous HPC $(ReH _2 PC)$ clusters for which no established programming model exists, despite many proposals in the past. In contrast to this, the Message Passing Interface (MPI) has evolved as the de-facto standard to program classical HPC clusters, due to its high-re-usability and fast development of applications. This paper revisits the programming model of ReH _2 PC clusters and argues that MPI is suitable for program-ming heterogeneous clusters of FPGAs and CPUs. Our experiments with 31 FPGAs show an average speedup of 4 and a 90% reduction of power consumption compared to a cluster of CPUs. We demonstrate a one-click solution for compiling and deploying a standard MPI application on ReH2PC clusters. Our framework implements a High-Level Synthesis (HLS) library, a specific run-time environment for FPGAs and CPUs, and a transpiler that closes the semantic gap between the MPI API and FPGA designs. Our experiments with 31 FPGAs show an average speedup of 4 and a 90% reduction of power consumption compared to a cluster of CPUs.