Build, Operate, and Use a multi-tenant AI cluster based entirely on open sourceClaudia MisaleOlivier Tardieuet al.2025KubeCon EU 2025
Incremental GPU Slicing in ActionAbhishek MalvankarOlivier Tardieu2024CNCF-hosted Co-located Events North America 2024
Caspian: A Carbon-aware Workload Scheduler in Multi-Cluster Kubernetes EnvironmentsTayebeh BahreiniAsser Tantawiet al.2024MASCOTS 2024
GPU OPTIMIZATIONS FOR EFFICIENT AND COST-EFFECTIVE ACCESS TO DIVERSE LARGE LANGUAGE MODELS IN RESEARCH CLUSTERChen WangYue Zhuet al.2024MLSys 2024
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroSys 2024
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroMLSys 2024
Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU SlicingAbhishek MalvankarOlivier Tardieu2024KubeCon EU 2024
Training Foundation Model Workloads on Kubernetes at Scale With MCADOlivier TardieuAbhishek Malvankar2023K8SAIHPCDAY 2023