Beyond a Centralized Verifier: Scaling Data Plane Checking via Distributed, On-Device Verification
Abstract
Centralized data plane verification (DPV) faces significant scalability issues in large networks (i.e., the verifier being a performance bottleneck and single point of failure and requiring a reliable management network). We tackle this scalability challenge by introducing Tulkun, a distributed, on-device DPV framework. Our key insight is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight tasks executed at network devices, enabling fast data plane checking in networks of various scales and types. With this insight, Tulkun consists of (1) a declarative invariant specification language, (2) a planner that employs a novel data structure DPVNet to systematically decompose global verification into on-device counting tasks, (3) a distributed verification messaging (DVM) protocol that specifies how on-device verifiers efficiently communicate task results to jointly verify the invariants, and (4) a mechanism to verify invariant fault-tolerance with minimal involvement of the planner. Extensive experiments with real-world datasets (WAN/LAN/DC) show that Tulkun verifies a real, large DC in 41 seconds while others tools need minutes or up to tens of hours, and shows an up to 2355× speed up on 80% quantile of incremental verification with small overhead on commodity network devices.