Dilated Convolution for Time Series Learning
Wang Zhang, Subhro Das, et al.
ICASSP 2025
In this paper a new blocked sparse matrix vector product parallel algorithm based on Code Saturne native matrix format is proposed in order to improve the OpenMP scalability. New sparse matrix storage options based on the native matrix format, and corresponding algorithms, are implemented in Code Saturne. In addition, traceguided optimisations for reduced synchronisation and better load balance are proposed and their efficiency is investigated on different processor architectures. Results are presented for a range of systems, including architectures of PRACE Tier-0 machines, IBMBlue Gene/Q and iDataPlex (Sandybridge, Ivybridge) and Cray XC30 (Ivybridge). Initial results indicate that the new algorithm has a significantly better parallel performance across the tested hardware with respect to the native OpenMP sparse matrix vector product algorithm.
Wang Zhang, Subhro Das, et al.
ICASSP 2025
Arnold L. Rosenberg
Journal of the ACM
Els van Herreweghen, Uta Wille
USENIX Workshop on Smartcard Technology 1999
Arthur Nádas
IEEE Transactions on Neural Networks