Peng Wu, Maged M. Michael, et al.
CCPE
Value-based replay eliminates the need for content-addressable memories in the load queue, removing one barrier to scalable out-of-order instruction windows. Instead, correct memory ordering is maintained by simply re-executing certain load instructions in program order. A set of novel filtering heuristics reduces the average additional cache bandwidth demanded by value-based replay to less than 3.5 percent.
Peng Wu, Maged M. Michael, et al.
CCPE
David Daly, Harold W. Cain
HPCA 2012
Harold W. Cain, Mikko H. Lipasti, et al.
Journal of Instruction-Level Parallelism
Calin Cascaval, Colin Blundell, et al.
Communications of the ACM