S2TAR-Cloud: Shared Secure Trusted Accelerators with Reconfiguration for Machine Learning in the Cloud
Abstract
The demand for hardware accelerators such as Tensor Processing Units (TPUs) and GPUs has been growing rapidly with the rise of Machine Learning (ML) workloads in cloud environments. As with other computing resources shared in the cloud, there is also a growing demand to partition or dynamically adjust accelerator services while ensuring data privacy and confidentiality. We propose a secure and reconfigurable TPU design with confidential computing support, which is achieved through a Trusted Execution Environment (TEE) framework tailored for TPU-like accelerators in multi-tenant cloud settings. Our contributions include evaluation of TEEs for TPU-like accelerators in shared environments, together with a novel TPU design based on switchbox-enabled systolic arrays to support rapid dynamic partitioning. Our remote attestation protocol extends to sub-device partitions to provide trustworthiness on a fine-grained level and decouples host and accelerator TEEs into separate attestation reports without degrading security guarantees. Our work provides a practical, secure TPU design and studies multi-tenant TEEs with realistic ML workloads.