Soft x-ray diffraction of striated muscle
S.F. Fan, W.B. Yun, et al.
Proceedings of SPIE 1989
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-level parallelism and improve performance. The techniques include reordering to reduce cache misses (originally due to Das et al.), blocking to reduce load instructions, and prefetching to prevent multiple load-store units from stalling simultaneously. The techniques improve performance from about 40 MFLOPS (on a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machine. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UltraSPARC™ I workstation, for example.
S.F. Fan, W.B. Yun, et al.
Proceedings of SPIE 1989
Anupam Gupta, Viswanath Nagarajan, et al.
Operations Research
B.K. Boguraev, Mary S. Neff
HICSS 2000
Kento Tsubouchi, Yosuke Mitsuhashi, et al.
npj Quantum Information