Finding low-utility data structures

Guoqing Xu; Nick Mitchell; Matthew Arnold; Atanas Rountev; Edith Schonberg; Gary Sevitsky

doi:10.1145/1806596.1806617

PLDI 2010

Conference paper

23 Jul 2010

Finding low-utility data structures

View publication

Abstract

Many opportunities for easy, big-win, program optimizations are missed by compilers. This is especially true in highly layered Java applications. Often at the heart of these missed optimization opportunities lie computations that, with great expense, produce data values that have little impact on the program's final output. Constructing a new date formatter to format every date, or populating a large set full of expensively constructed structures only to check its size: these involve costs that are out of line with the benefits gained. This disparity between the formation costs and accrued benefits of data structures is at the heart of much runtime bloat. We introduce a run-time analysis to discover these low-utility data structures. The analysis employs dynamic thin slicing, which naturally associates costs with value flows rather than raw data flows. It constructs a model of the incremental, hop-to-hop, costs and benefits of each data structure. The analysis then identifies suspicious structures based on imbalances of its incremental costs and benefits. To decrease the memory requirements of slicing, we introduce abstract dynamic thin slicing, which performs thin slicing over bounded abstract domains. We have modified the IBM J9 commercial JVM to implement this approach. We demonstrate two client analyses: one that finds objects that are expensive to construct but are not necessary for the forward execution, and second that pinpoints ultimately-dead values. We have successfully applied them to large-scale and long-running Java applications. We show that these analyses are effective at detecting operations that have unbalanced costs and benefits. © 2010 ACM.

Conference paper