WARDEN: Warranting Robustness Against Deception in Next-Generation Systems
Power-management system design: The complexity of effective power-management system design, which involves engineering numerous components and hardware-software interfaces across the computing stack, creates an attack space for adversaries (Tang et al., 2018; Vega et al., 2017). The components in today’s computing systems demonstrate highly optimized power-efficient designs, which may, for example, include heterogeneous architectures and per-core frequency/voltage islands, and very fine-grained software control of the frequency and voltage settings that are designed to be extremely sensitive to maximize performance. Unfortunately, the dynamic voltage frequency scaling interfaces to power-management hardware can be abused by malicious actors to induce hardware/software faults, infer confidential data, and rewrite these data. Power oversubscription in data centers: In addition to the complexity introduced by an effective powermanagement system design, power oversubscription has become a trend in data centers in order to locate more servers on the existing power infrastructure of a data center than it can support when they all operate at the maximum possible power consumption at the same time (Xu et al., 2014). The reason why power oversubscription is possible is that the power consumption of the servers in a data center rarely reaches its maximum. Nevertheless, the possibility that the power consumption of these server peaks remains, which results in a risk of power outages because maximum power consumption produces the overloading of electrical circuits and then triggers the trip of circuit breakers. This can constitute an attack vector since malicious actors can induce power outages by making all servers within a rack reach their maximum power consumption simultaneously. Power contention: Furthermore, power needs to be treated as a valuable shared resource for systems especially with the emergence of power capping mechanisms and tech- niques. For example, executing a program with relatively high power consumption on power capped systems can cause “power contention” (Sasaki et al., 2016). Power contention forces the power management system to throttle the system and as a result degrades its performance. This can be a serious concern from both performance and power perspectives; for instance, a victim server with a reasonable power cap for typical applications can observe unexpected performance degradation and wasted power consumption when such intentional power hogs are executed. Attack-pattern recognition by ML and ECCs: Malicious users of a data center can reverse engineer power-management functions to exploit several powermanagement design issues. Despite hardware-enforced isolation, all three key security properties can be violated, namely confidentiality, integrity, and availability. Designing effective defenses against malicious actors for a robust and secure system thus requires engineering strong attacks. We propose an attack-pattern recognition system which is powered by machine learning (ML) and which consists of using error-correcting codes (ECCs) in order to detect the malicious workloads, thereby conferring robustness and security to power-management system design.