vllm-triton-backend: How to get state-of-the-art performance on NVIDIA and AMD with just tritonBurkhard RingleinThomas Parnellet al.2025PyTorch Conference 2025
The Anatomy of a Triton Attention BackendBurkhard RingleinJan van Lunterenet al.2025Triton Developer Conference 2025
LLM-Pilot: Characterize and Optimize Performance of your LLM Inference ServicesGosia LazukaAndreea Simona Anghelet al.2024SC 2024
Achieving Platform Portability for vLLM by using Triton Autotuning and Remembering itBurkhard RingleinThomas Parnell2024Ray Summit 2024
xCloudServing: Automated and Optimized ML Serving across CloudsGosia LazukaAndreea Simona Anghelet al.2023CLOUD 2023
Breadth-first, Depth-next Training of Random ForestsAndreea Simona AnghelNikolas Ioannouet al.2019NeurIPS 2019
Benchmarking and Optimization of Gradient Boosting Decision Tree AlgorithmsAndreea Simona AnghelNikolaos Papandreouet al.2018NeurIPS 2018
Tera-scale coordinate descent on GPUsThomas ParnellCelestine Dünneret al.2018Future Generation Computer Systems
Is there a “rowhammer” for MLC NAND Flash SSDs? An analysis of filesystem attack vectorsAnil KurmusNikolas Ioannouet al.2017WOOT/USENIX Security 2017
Enhancing the Reliability of MLC NAND Flash Memory Systems by Read Channel OptimizationNikolaos PapandreouThomas Parnellet al.2015ACM TODAES