With the rise of custom silicon chips for AI acceleration, fair and comprehensive benchmarking of hardware innovations has become increasingly important. Yet being both fair AND comprehensive is not at all easy (thus the present workshop). Interesting innovations may get introduced and demonstrated at one level (circuit- or architecture-level), but the actual resulting benefits really ought to be assessed at some much-higher level (system- or application-level), and this may not be immediately practical or feasible. Costs that are common to many accelerator approaches, such as the energy needed to load the next set of model weights into scratchpad memory, are frequently ignored for simplicity. Yet this greatly complicates the fair assessment of alternative approaches that completely avoid these costs. After an overview of benchmarking strategies at different abstraction levels, I discuss the best practices and pitfalls to-be-avoided that I’ve learned, from my time on the ISSCC/ML subcommittee and as a researcher working on nonvolatile-memory-based AI accelerators.