Composability of Cloud Accelerators in Virtual World Simulations
- 2023
- CLOUD 2023
Field programmable gate arrays (FPGAs) are making their way into data centers (DC). They serve to offload and accelerate service-oriented tasks such as web-page ranking, memory caching, deep learning, network encryption, video conversion and high-frequency trading.
However, FPGAs are not yet available at scale to general cloud users who want to accelerate their own workload processing. This puts the cloud deployment of compute-intensive workloads at a disadvantage compared with on-site infrastructure installations, where the performance and energy efficiency of FPGAs are increasingly being exploited.
cloudFPGA solves this issue by offering FPGAs as an IaaS, PaaS or FaaS resource to cloud users. Using the cloudFPGA system, users can deploy FPGAs — similarly to VMs in the cloud — thus paving the way for large-scale utilization of FPGAs in DCs.
The cloudFPGA system is built on four pillars:
The concept of stand-alone network-attached FPGA builds on two main initiatives:
The network attachment sets the FPGA free from the traditional CPU–FPGA attachment by connecting the FPGA directly to the DC network. As a result, the number of distributed FPGAs becomes independent of the number of servers.
To enable cloud users to rent, use and release large numbers of FPGAs on the cloud, the FPGA resource must become plentiful in DCs.
The cloudFPGA infrastructure is the key enabler of such a large-scale deployment of FPGAs in DCs. It was designed from the ground up to provide the world’s highest-density and most energy-efficient rack unit of FPGAs.
The infrastructure combines a passive and an active water-cooling approach to pack 64 FPGAs into one 19"×2U chassis. Such a chassis is made up of two Sleds, each with 32 FPGAs and one 64-port 10GbE Ethernet switch providing 640 Gb/s bi-sectional bandwidth.
In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs and 16 TB of DRAM.
Today, the prevailing way to incorporate an FPGA into a server is to connect it to the CPU over a high-speed, point-to-point interconnect such as the PCIe bus, and to treat that FPGA resource as a co-processor worker under the control of the server CPU.
However, because of this master–slave programming paradigm, such an FPGA is typically integrated in the cloud only as an option of the primary host compute resource to which it belongs. As a result, bus-attached FPGAs are usually made available in the cloud indirectly via Virtual Machines (VMs) or Containers.
In our deployment, in contrast, a stand-alone, network-attached FPGA can be requested independently of a host via the cloudFPGA Resource Manager (cFRM, see figure). The cFRM provides a RESTful (Representational State Transfer) API (Application Program Interface) for integration in the Data Center (DC) management stack (e.g. OpenStack).
Cloud integration is the process of making a resource available in the cloud. In the case of cloudFPGA, this process is done by the combination of three levels of management (see Figure): A cloudFPGA Resource Manager (cFRM), a cloudFPGA Sled Manager (cFSM), and an cloudFPGA Manager Core (cFMC).
In the end, the components of all levels work together to provide the requested FPGA resources in a fast and secure way.
System architecture for the cloudFPGA platform. 32 FPGAs, one switch and a service processor are combined on one carrier board and called Sled. The management tasks are split into three levels — cloudFPGA Resource Manager (cFRM), cloudFPGA Sled Manager (cFSM), and cloudFPGA Manager Core (cFMC). A Sled is half of a 2U chassis. The OpenStack compute resources (Nova) CPU nodes are also available for creating heterogeneous clusters.
The cloudFPGA organization is a central place for sharing and hosting cloudFPGA related projects and collaborations. The organization consists of two main types of repositories:
IBM researchers in Switzerland have released the cloudFPGA development kit, named cFDK, which enables developers to deploy accelerated compute kernels as a network-attached function on field-programmable gate arrays (FPGAs) within minutes. Recently open sourced, it is the first development suite targeting standalone, network-attached FPGAs in the cloud, enabling scalable, FPGA-accelerated cloud-native applications.
Read the entire article here.
We recently recorded a webinar as part of the EVEREST project and we thought that it might be of interest to some of our followers. If you are familiar with the cloudFPGA project, you may want to skip the first 20 minutes which cover generalities about FPGAs and their use in the EVEREST project. Next, we present two "Hello world" demonstrations that exemplify the cloudFPGA development flow, the interface with the resource manager, and the interaction with the deployed FPGAs.
The cloudFPGA research platform has been selected by the EVEREST consortium to be one of its main demonstrator systems. EVEREST is an European project funded by the Horizon 2020 Programme for research and innovation. EVEREST stands for dEsign enVironmEnt foR Extreme-Scale big data analyTics on heterogeneous platforms. Its target is to develop a design environment to simplify the implementation of Big Data applications on FPGA-based platforms.
Read the press release here.
The cloudFPGA platform is open source and can be accessed on github.