About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SYSTOR 2012
Conference paper
Optimizing indirect branches in a system-level dynamic binary translator
Abstract
A dynamic binary translator (DBT) is a runtime system that translates binary code on the fly, for example to emulate the execution of the binary code on a processor with a different instruction set. One of the major sources of the overhead is the resolution of the branch target addresses for indirect branch instructions. Previous work has addressed this problem for a single virtual address space, but none has addressed it for multiple virtual address spaces in the context of the system-level DBT. This is challenging for compiler optimizations because the compiler cannot compute the virtual addresses of the branch targets for indirect branches at compile-time since they are affected by the runtime states of the emulated TLB. In this paper, we propose a new compiler optimization technique to address the problem for a system-level DBT. Our key idea is to use an offset from the virtual address of each page that contains a branch instruction, since this offset is not affected by the emulated TLB. We found that the compiler can often compute the offset using compile-time constants and that this approach significantly simplifies the guard code necessary for an indirect branch. We implemented this technique in a compiler of a system-level DBT for the z/Architecture. Our experimental results showed our technique can reduce the execution times of the CBW2 benchmarks, part of the standard LSPR benchmark, by up to 5.9% and 2.5% on average. Our analysis indicated that our technique was able to optimize 3.8% of the total dynamic instructions in the original binary code, while completely removing the guard code for 98.9% of these indirect branches. © 2012 ACM.