About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Computer Networks
Paper
Large switches or blocking multi-stage networks? An evaluation of routing strategies for datacenter fabrics
Abstract
Cloud computing clusters require efficient interconnects to deal with the increasing volume of inter-server (east-west) traffic. To cope with these new traffic patterns, datacenter networks are abandoning the oversubscribed topologies of the past, and adopt fat-tree fabrics with high bisection bandwidth. However, these fabrics typically employ either single-path or coarse-grained (flow-level) multipath routing. In this paper, we characterize the waste of bandwidth due to routing inefficiencies. Our analysis, confirmed by computer simulations, demonstrates that under a randomly selected permutation the expected throughputs of d-mod-k routing and of Equal-Cost-Multi-Pathing (ECMP) (or flow-level multipath routing) (Thaler and Hopps, 2000) [1] are close to 63% and 47%, respectively. Furthermore, nearly 30% of the flows are expected to undergo an unnecessary 3-fold slowdown. In contrast, packet-level multipath routing consistently delivers full throughput to all flows, thus serving the growing demands of inter-server (east-west) traffic better. Using unmodified TCP stacks, we also demonstrate that under typical traffic conditions and system configurations flow-level multi-path routing can abruptly increase the completion time of latency-critical flows by more than one order of magnitude. In contrast, packet-level multipath routing, proactively avoids in-fabric backlogs, and minimizes the flow completion time across the full range of configurations that we examine. Finally, we present the design of a cost-efficient switch node performing adaptive packet-level spraying.