Finding rare numerical stability errors in concurrent computations

Hana Chockler; Karine Even; Eran Yahav

doi:10.1145/2483760.2483791

ISSTA 2013

Conference paper

14 Aug 2013

Finding rare numerical stability errors in concurrent computations

View publication

Abstract

A numerical algorithm is called stable if an error, in all possible executions of the algorithm, does not exceed a predefined bound. Introduction of concurrency to numerical algorithms results in a significant increase in the number of possible computations of the same result, due to different possible interleavings of concurrent threads. This can lead to instability of previously stable algorithms, since rounding can result in a larger error than expected for some interleavings. Such errors can be very rare, since the particular combination of rounding can occur in only a small fraction of interleavings. In this paper, we apply the cross-entropy method - a generic approach to rare event simulation and combinatorial optimization - to detect rare numerical instability in concurrent programs. The cross-entropy method iteratively samples a small number of executions and adjusts the probability distribution of possible scheduling decisions to increase the probability of encountering an error in a subsequent iteration. We demonstrate the effectiveness of our approach on implementations of several numerical algorithms with concurrency and rounding by truncation of intermediate computations. We describe several abstraction algorithms on top of the implementation of the cross-entropy method and show that with abstraction, our algorithms successfully find rare errors in programs with hundreds of threads. In fact, some of our abstractions lead to a state space whose size does not depend on the number of threads at all. We compare our approach to several existing testing algorithms and argue that its performance is superior to other techniques. © 2013 ACM.

Paper