Takayuki Katsuki
AI Alliance Tokyo 2025
Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. To address this, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach to efficiently adapt transformers for AIMC hardware. Unlike conventional AHWA training that retrains the entire model, AHWA-LoRA training keeps the analog weights as fixed, meta-weights. These are adapted using lightweight, external LoRA modules. We validate AHWA-LoRA training on SQuADv1.1 and the GLUE benchmark, demonstrate its scalability to larger models (e.g., BERT-Large, LLaMA), and show its effectiveness in instruction tuning and reinforcement learning. We also evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.
Takayuki Katsuki
AI Alliance Tokyo 2025
Geoffrey Burr, Sidney Tsai, et al.
CICC 2025
Alper Buyuktosunoglu, David Trilla Rodriguez, et al.
HPCA 2024
Jim Garrison, Caleb Johnson, et al.
QCE 2023