The Cost of Flexibility and Security in Cloud-Based HPC - A Case Study Running EDA Workloads With Confidential Computing Technology
Abstract
Design of modern very large scale integrated circuits (VLSI) using electronic design automation (EDA) is an increasingly compute intensive and complex endeavor. Because of typical product cycles in chip design, EDA is an excellent candidate for offloading bursts of computations to cloud-based resources when close to design deadlines, to reduce infrastructure cost and improve flexibility by offering virtually unlimited computational power on-demand. However, running EDA workloads poses significant security risks, due to the designers’ intellectual property (IP) and high-value foundry process design kits (PDKs). The cost of a leaked proprietary design is measured in millions of dollars, loss of competitivity and brand damage. To guarantee security of these highly valuable assets, all data and computations in the EDA workloads must be secured. Traditionally, encryption has been an effective solution to protect data at rest and in motion; however, data in use has so far seen less secure solutions based mostly on virtualization. Emerging confidential computing techniques allow to improve this aspect by providing truly isolated and encrypted environments for the computations. However, as of today, there is no comprehensive study on the challenges of running HPC workloads in confidential enclaves, and on how to deploy confidential computing in the public cloud. This talk focus on EDA workloads, as a proxy to generic HPC workloads that need thousands of cores, high-bandwidth network communication and shared storage. We present our experience at running cloud-native EDA workloads in confidential VMs through the use of Confidential Containers, that allows a zero-effort conversion of cloud-native workloads. We will briefly discuss existing and novel mechanisms to integrate the data-in-use protection of Confidential Containers with secure private/shared storage and network. Then, we will focus on measuring and characterizing the performance overhead of protecting data in every stage of the computation.