In software-defined networking (SDN) paradigm, where the control and data plane are separated, the scalability of the SDN controller in the control plane is critical and can affect the overall network performance significantly. To improve controller scalability, efforts have been put into enhancing the capability of SDN switches in the data plane, to make them more autonomous in providing routine services without consulting the controller. In this regard, we investigate the service placement problem on SDN switches aiming at minimizing the average accumulated service costs for end users. To solve this problem, we propose a novel reinforcement-learning-based algorithm with guaranteed performance and convergence rate, called Q-placement. Comparing to traditional optimization techniques, Q-placement exhibits many appealing features, such as performance-tuneable optimization and off-the-shelf implementation. Extensive evaluations show that Q-placement consistently outperforms benchmarks and other state-of-the-art algorithms in both synthetic and real networks. Moreover, these evaluations reveal insights into how the network topological properties (e.g., density), servicing capacities, and controller's roles affect the accumulated service costs, which is useful in service planning tasks.