Co-design in High-Performance Systems
Hardware/Software co-design has been pursued in the High-Performance Computing (HPC) community as a methodology for application scientists, software, and hardware communities to work together for designing future computing environments. In this paper, we first summarize an industry experience, namely the journey in designing and building two supercomputers –Summit and Sierra– delivered to the Department of Energy Labs (DoE) in 2018, developed through collaboration among teams from IBM and NVIDIA, as well as from DOE Labs. We then transition to the Cloud environment, as HPC users are increasingly considering using Cloud infrastructure due to its scale and flexibility. We describe an internal EDA case wherein users have extended their on-premises cluster to the Cloud and operate in a hybrid-cloud environment, bursting the workload into the Cloud when on-premises data center capacity is strained, enabled by advanced system software management solutions.