Border Gateway Protocol (BGP), the de-facto inter-domain routing protocol, allows Autonomous Systems (AS) to apply their own local policies for selecting routes and propagating routing information. However, BGP cannot make performance-based routing decisions, and instead often routes traffic through congested paths, resulting in poor performance. This paper presents an efficient and scalable multi-agent reinforcement learning (MARL) method for inter-domain routing. It allows ASes to achieve higher overall throughput for real-time traffic demand, with the following highlights: (1) it ensures that traffic is forwarded along policy compliant paths; (2) it satisfies partial observability and selfishness of each AS; (3) the proposed solution is scalable as it only requires ASes to share information within a limited radius; (4) the solution is incrementally deployable, requiring only tens of ASes in the entire network to run it to start reaping benefits. We conduct extensive evaluation on actual network topologies ranging from hundreds to tens of thousands of ASes. The results show throughput improvements of up to 17% as compared to default BGP routing.