Invited talk

TEMpesT: Testing Empirically for Memory Transistency

Abstract

Ensuring correct execution of programs running on today’s parallel systems becomes difficult when memory is shared across several processing units. Memory consistency models (MCMs) were defined to provide a contract between different levels of the hardware-software stack to specify shared memory access orderings for correct implementations. However, instruction set architecture (ISA) MCMs traditionally only reason about the program-visible impacts of shared memory accesses for user-facing program instructions. In virtual memory systems though, there are additional hardware and operating system (OS) level shared memory accesses that can occur to facilitate address translation and may impact the program execution. Memory transistency models (MTMs) were thus coined to define a superset of MCMs that additionally account for underlying virtual memory operations. However, MTM implementations are complex as they are managed across the hardware and OS, making them difficult to specify. In such cases, empirical testing is the usual approach for effective validation of a specification. However, empirical MTM testing is challenging due to the complex ordering relationships between hardware, OS, and user-level operations, as well as the inability to explicitly run and control virtual memory operations with user-level programs. While many MCM testing tools exist and have been used to uncover system bugs, there is no existing work on developing systematic MTM testing techniques, nor has there been any analysis of how effective such techniques are.

In this work, we introduce TEMpesT, the first full-system framework that uses ISA-agnostic techniques for performing fuzz testing for MTMs across both hardware and OS levels. TEMpesT uses enhanced litmus tests (ELTs) with novel targeted user-level MTM fuzzing techniques to induce virtual memory operations and coax out corner case behaviors. We used TEMpesT to validate the formal MTM specification x86t_elt on an Intel x86 system running Linux. Our results show TEMpesT’s techniques are able to induce high outcome variety, resulting in 94% of all possible outcomes being observed with just 10,000 iterations of each ELT synthesized for x86t_elt, outperforming prior methodology by 5.8×. This article also introduces MTM mutation tests that we used to evaluate TEMpesT’s fuzzing techniques and demonstrate effective MTM bug detection.