Daniel Karl I. Weidele, Priyanshu Rai, et al.
AAAI 2026
Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibility and adaptability across diverse tasks. To address this, we propose Analog Hardware-Aware Low-Rank Adaptation (AHWA-LoRA) training, a novel approach to efficiently adapt transformers for AIMC hardware. Unlike conventional AHWA training that retrains the entire model, AHWA-LoRA training keeps the analog weights as fixed, meta-weights. These are adapted using lightweight, external LoRA modules. We validate AHWA-LoRA training on SQuADv1.1 and the GLUE benchmark, demonstrate its scalability to larger models (e.g., BERT-Large, LLaMA), and show its effectiveness in instruction tuning and reinforcement learning. We also evaluate a practical deployment scenario that balances AIMC tile latency with digital LoRA processing using optimized pipeline strategies, with RISC-V-based programmable multi-core accelerators. This hybrid architecture achieves efficient transformer inference with only a 4% per-layer overhead compared to a fully AIMC implementation.
Daniel Karl I. Weidele, Priyanshu Rai, et al.
AAAI 2026
Italo Buleje, Vince Siu, et al.
ICDH 2023
Marcelo Amaral, Tatsuhiro Chiba
Kubecon + CloudNativeCon NA 2023
Andy Anderson
KubeCon EU 2025