A fast parallel particle filter for shared memory systems
Particle Filters (PFs) are Sequential Monte Carlo methods which are widely used to solve filtering problems of dynamic models under Non-Linear Non-Gaussian noise. Modern PF applications have demanding accuracy and run-time constraints that can be addressed through parallel computing. However, an efficient parallelization of PFs can only be achieved by effectively parallelizing the bottleneck: Resampling and its constituent redistribution step. A pre-existing implementation of redistribute on Shared Memory Architectures (SMAs) achieves O(N T log2N) time complexity over T parallel cores. This redistribute implementation is, however, highly computationally intensive and cannot be effectively parallelized due to the inherently limited number of cores of SMAs. In this paper, we propose a novel parallel redistribute on OpenMP 4.5 which takes O(N T + log2N) steps and fully exploits the computational power of SMAs. The proposed approach is up to six times faster than the O(N T log2N) one and its implementation on GPU provides a further three-time speed-up vs its equivalent on a 32-core CPU.We also show on an exemplary PF that our redistribution is no longer the bottleneck.