Flash: Fast Model Adaptation in ML-Centric Cloud PlatformsHaoran QiuWeichao Maoet al.2024MLSys 2024Conference paper
GPU OPTIMIZATIONS FOR EFFICIENT AND COST-EFFECTIVE ACCESS TO DIVERSE LARGE LANGUAGE MODELS IN RESEARCH CLUSTERChen WangYue Zhuet al.2024MLSys 2024Poster