Network virtualization is an essential technique for data center operators to provide traffic isolation, differentiated service, and security enforcement for multi-tenant services. However, traditional protocols used in local area networks may not be applicable for data center networks due to the difference in network topology. Recent research suggests that layer-2-in-layer-3 tunneling protocols may be the solution to address the challenges. In this article, we find via testbed experiments that directly applying these tunneling protocols toward network virtualization only results in poor performance due to the scalability problems. Specifically, we observe that the bottlenecks actually reside inside the servers. We then propose a CPU offloading mechanism that exploits a packet steering function to balance packet processing among available CPU threads, thus greatly improving network performance. Compared to a virtualized network created based on VXLAN, our scheme improves the bandwidth for up to almost 300 percent on a 10 Gb/s link between a pair of tunnel endpoints.