Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU SlicingAbhishek MalvankarOlivier Tardieu2024KubeCon EU 2024
Incremental GPU Slicing in ActionAbhishek MalvankarOlivier Tardieu2024CNCF-hosted Co-located Events North America 2024
Achieving Platform Portability for vLLM by using Triton Autotuning and Remembering itBurkhard RingleinThomas Parnell2024Ray Summit 2024
Resource As You Wish: Collaborative Reservation and Allocation by Scheduler Plugin and Device PluginTakuya MishinaTatsuhiro Chiba2024KubeDay Japan 2024
Using automation to mitigate risk and enforce policy complianceMatthew JonesYuji Watanabeet al.2024Red Hat Summit 2024
Trimaran: Load-Aware Scheduling for Power Efficiency and Performance StabilityAsser TantawiChen Wang2024KubeCon EU 2024
CASPIAN: A Carbon-Optimized Multi-Cluster Job SchedulerTayebeh BahreiniAsser Tantawi2024KubeCon EU 2024
Training Foundation Model Workloads on Kubernetes at Scale With MCADOlivier TardieuAbhishek Malvankar2023K8SAIHPCDAY 2023