Parallelism-Centric optimization and performance study of a finance aggregation engine on modern NUMA systems
Mark-to-future aggregation is a key component of counter- party credit risk analysis in the IBM Algorithmics software. Its computation exhibits complex memory access and control flow patterns, and is hard to accelerate. The prior effort to improve performance takes a "pre-compiled" approach that aims to reduce the overhead and inefficiencies primarily through compiler techniques. While combined with other optimizations, the performance is improved by 3 to 5 times, many extra lines of code are dynamically generated. Maintenance and testing become a challenge. In our study we take a parallelism centric approach guided by hardware counter based profiling. Minimal modifications are made to the code and about 10 times speedup is achieved. We also study the behavior of mark-to-future aggregation on a NUMA platform. We evaluate the impact of architectural choices on the performance. Our study sheds some light on accelerating mark-to-future aggregation on current and emerging architectures.