Remote Procedure Call (RPC) has been the cornerstone of distributed systems since the early 80s. Recently, new classes of large-scale distributed systems running in data centers are posing extra challenges for RPC systems in terms of scaling and latency. We find that existing RPC systems make very poor usage of resources (CPU, memory, network) and are not ready to handle these upcoming workloads. In this paper we present DaRPC, an RPC framework which uses RDMA to implement a tight integration between RPC message processing and network processing in user space. DaRPC efficiently distributes computation, network resources and RPC resources across cores and memory to achieve a high aggregate throughput (2-3M ops/sec) at a very low per-request latency (10μs with iWARP). In the evaluation we show that DaRPC can boost the RPC performance of existing distributed systems in the cloud by more than an order of magnitude for both throughput and latency.