David W. Jacobs, Daphna Weinshall, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence
This article describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of symmetric banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributed-memory parallel computer, shows that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-node SP2 factors a large matrix more than 16 times faster than a single node would factor it using the best sequential algorithm, and more than 20 times faster than a single node would using LAPACK's DPBTRF. The algorithm uses novel ideas in the area of distributed dense-matrix computations that include the use of a dynamic schedule for a blocked systolic-like algorithm and the separation of the input and output data layouts from the layout the algorithm uses internally. The algorithm also uses known techniques such as blocking to improve its communication-to-computation ratio and its data-cache behavior.
David W. Jacobs, Daphna Weinshall, et al.
IEEE Transactions on Pattern Analysis and Machine Intelligence
Jonathan Ashley, Brian Marcus, et al.
Ergodic Theory and Dynamical Systems
William Hinsberg, Joy Cheng, et al.
SPIE Advanced Lithography 2010
Leo Liberti, James Ostrowski
Journal of Global Optimization