Decompiling boolean expressions from Java™ bytecode
Abstract
Java bytecode obfuscates the original structure of a Java expression in the source code. So a simple expression such as (c1 || c2) or (c1 & c2) may be captured in the bytecode in 4 different ways (as shown in the paper). And correspondingly, when we reconvert the bytecode back into Java source code, there are four different ways this may happen. Further, although gotos are not permitted in the Java source code, the bytecode is full of gotos. If you were to blindly convert the bytecode into Java source code, then you would replace a goto by a labeled break. A labeled break has the advantage that it only allows you to break out of a block structure and (unlike a setjump) does not permit you to jump arbitrarily into a block structure. So while the data structures used in the regenerated Java source code are still relatively "clean" arbitrary usage of labeled breaks makes for unreadable code (as we show in the paper). And this can be a point of concern, since decompilation is generally related to debugging code. Instead of dumping arbitrary labeled breaks, we try to reconstruct the original expression, in terms of & and || clauses as well as ternary operators "?:" (c0 ? c1:c2); Thus our goal is quite simply to regenerate, without using goto or labeled breaks, the expressions as close to the original as possible (it is not possible to guarantee an exact match). In this paper we explain what is the state of the art in Java decompilers for decoding complex expressions. Then we will present our solution. We have implemented the algorithms described here in this paper and give you our experience with it.