Corey Lammie, Yuxuan Wang, et al.
IEEE TETC
Analog Non-Volatile Memory-based accelerators offer high-throughput and energy-efficient Multiply-Accumulate operations for the large Fully-Connected layers that dominate Transformer-based Large Language Models. We describe architectural, wafer-scale testing, chip-demo, and hardware-aware training efforts towards such accelerators, and quantify the unique raw-throughput and latency benefits of Fully- (rather than Partially-) Weight-Stationary systems.
Corey Lammie, Yuxuan Wang, et al.
IEEE TETC
Vasileios Kalantzis, Anshul Gupta, et al.
HPEC 2021
Valeria Bragaglia, Donato Francesco Falcone, et al.
B-MRS 2024
Bert J. Offrein, Jacqueline Geler-Kremer, et al.
IEDM 2020