Towards Pareto Optimal Throughput in Small Language Model ServingPol G. RecasensYue Zhuet al.2024EuroMLSys 2024