Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory SystemYunhua FangRui Xieet al.2025IEEE Computer Architecture Letters
NORA: Noise-Optimized Rescaling of LLMs on Analog Compute-in-Memory AcceleratorsYayue HouHsinyu Tsaiet al.2025DATE 2025
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference InfrastructureRui XieAsad Ul Haqet al.2025IEEE Computer Architecture Letters