Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroSys 2024
Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroMLSys 2024
Unleashing the Power of DRA (Dynamic Resource Allocation) for Just-in-Time GPU SlicingAbhishek MalvankarOlivier Tardieu2024KubeCon EU 2024
Training Foundation Model Workloads on Kubernetes at Scale With MCADOlivier TardieuAbhishek Malvankar2023K8SAIHPCDAY 2023
A reactive language for analyzing cloud logsGuillaume BaudartOlivier Tardieuet al.2018SPLASH/REBLS 2018
The serverless trilemma: Function composition for serverless computingIoana BaldiniPerry Chenget al.2017Onward! 2017
CloudLens, a scripting language to analyze semi-structured textual dataGuillaume BaudartLouis Mandelet al.2017JFLA 2017