Protection against data-oriented attacks through selective data integrity
The financial sector experienced significant losses of billions of dollars in 2022, primarily attributed to cyber-attacks targeting major banks and corporations. One of the key attack strategies that attackers leverage to perform such high-end cyber-attacks is the control-oriented attack. In control-oriented attacks, attackers corrupt the control data (e.g., function pointer or return address) to divert the control-flow of the program to their chosen contents in order to compromise a system.
However, with the advancements in control-oriented defenses in the last few years, especially the advancements made in the Control-Flow Integrity (CFI) defense, control-oriented attacks have become unreliable. As a result, attackers shift their focus to another powerful class of attacks called data-oriented attack. Data-oriented attacks have become popular due to having a similar level of expressiveness (in terms of achieving malicious goals) like control-oriented attacks. However, unlike control-oriented attacks, data-oriented attacks do not need to corrupt the control data. Thus, the CFI defense cannot protect data-oriented attacks which makes data-oriented attacks very powerful. In data-oriented attacks, attackers manipulate the non-control data (i.e., variables, objects, and pointers used in a program) to construct a malicious execution flow of their choice, where the execution flow does not violate the normal flow of the program. Similar to control-oriented attacks, these attacks can lead to various security breaches, such as code execution or data leakage.
To prevent data-oriented attacks, it is necessary to protect the integrity of non-control data, similar to the protection of control-oriented attacks through integrity protection of control-data. However, integrity protection of non-control data poses two key challenges. First, a program contains a huge amount of non-control data (~100x more) compared to the control data. Second, the integrity protection of non-control data requires inter-procedural data-flow analysis, instrumentation, encryption, masking, etc. which are computationally expensive (run-time overhead up to 116% for software-based and up to 26% for hardware-based protections). This high overhead associated with the integrity protection of non-control data makes the existing countermeasures aiming for data-oriented attacks impractical.
In our 2023 USENIX Security paper, we explored how to make the integrity protection of non-control data practical. The key observation we made during our work is that not all the non-control data are equally sensitive or critical to construct data-oriented exploits. Based on this observation, we introduced the idea of prioritizing non-control data for practical and scalable protection of data-oriented attacks through protecting only the prioritized non-control data. Out of various non-control data, data objects and their pointers are often desirable as they allow attackers to leak information, stitch exploit components together, and perform stack-based or heap-based exploitation. Consequently, in our work, we propose a framework called Data and Pointer Prioritization (DPP) for prioritizing data and pointers based on their importance or vulnerability to attacks. This prioritization enables the capability to ensure the data integrity where it matters most, as opposed to doing so for all.
The prioritization process starts with identifying attack gateways such as system or library I/O functions that accept external inputs, and then propagates the inputs throughout a program to identify the other data objects or pointers that consume those external inputs. This propagation helps us locate data objects (throughout the program) dependent on potentially attacker-controlled inputs. To filter out external input dependent, but safe objects (or their pointers), DPP employs a set of rule-based heuristics, grounded in extensive data from known exploits and CVEs, to automatically identify sensitive data objects that could potentially lead to vulnerabilities. DPP then feeds the prioritized data objects and their pointers for protection. Figure 1 shows the high-level diagram of the abovementioned DPP operations.
We implemented a prototype of DPP in LLVM and evaluated the prototype using various datasets (e.g., Linux Flaw Project and Juliet Test Suite) as well as the SPEC CPU2017 benchmark. Our evaluation suggests that DPP can ensure the same level of security as protecting all non-control data, but with protecting only the prioritized data objects or pointers. We found that as high as 95% of non-control data in real-world programs may not need protection. DPP also improves performance in terms of throughput by about 1.6 times and reduces runtime overhead by roughly 70%.
Our prioritization scheme is new and different from the conventional protection paradigm as DPP enables the trade-offs between accuracy and performance. We can make DPP tunable in the security (false negative), usability (false positive), and performance dimensions. The rules used by DPP are simple and have the capability to anticipate future attacks. This is because the rules are deduced by abstracting exploits into common vulnerability patterns. However, DPP still requires some work and a broader benchmark to fully assess its effectiveness.