Large switches or blocking multi-stage networks? An evaluation of routing strategies for datacenter fabrics
Cloud computing clusters require efficient interconnects to deal with the increasing volume of inter-server (east-west) traffic. To cope with these new traffic patterns, datacenter networks are abandoning the oversubscribed topologies of the past, and adopt fat-tree fabrics with high bisection bandwidth. However, these fabrics typically employ either single-path or coarse-grained (flow-level) multipath routing. In this paper, we characterize the waste of bandwidth due to routing inefficiencies. Our analysis, confirmed by computer simulations, demonstrates that under a randomly selected permutation the expected throughputs of d-mod-k routing and of Equal-Cost-Multi-Pathing (ECMP) (or flow-level multipath routing) (Thaler and Hopps, 2000)  are close to 63% and 47%, respectively. Furthermore, nearly 30% of the flows are expected to undergo an unnecessary 3-fold slowdown. In contrast, packet-level multipath routing consistently delivers full throughput to all flows, thus serving the growing demands of inter-server (east-west) traffic better. Using unmodified TCP stacks, we also demonstrate that under typical traffic conditions and system configurations flow-level multi-path routing can abruptly increase the completion time of latency-critical flows by more than one order of magnitude. In contrast, packet-level multipath routing, proactively avoids in-fabric backlogs, and minimizes the flow completion time across the full range of configurations that we examine. Finally, we present the design of a cost-efficient switch node performing adaptive packet-level spraying.