FlowTracer: A Tool for Uncovering Network Path Usage Imbalance in AI Training ClustersHasibul JamilAbdul Alimet al.2025IEEE ICC 2025
Vela: A Virtualized LLM Training System with GPU Direct and RoCEApoorve MohanRobert Walkupet al.2025ASPLOS 2025