There is an emerging trend to deploy services in cloud environments due to their flexibility in providing virtual capacity and pay-as-you-go billing features. Cost-aware services demand computation capacity such as virtual machines (VMs) from a cloud operator according to the workload (i.e., service invocations) and pay for the amount of capacity used following billing contracts. However, as recent empirical studies show, the performance variability, i.e., non-uniform VM performance, is inherently higher than in private hosting platforms, since cloud platforms provide VMs running on top of typically heterogeneous hardware shared by multiple clients. Consequently, the provisioning of service capacity in a cloud needs to consider workload variability as well as varying VM performance. We propose an opportunistic service replication policy that leverages the variability in VM performance, as well as the on-demand billing features of the cloud. Our objective is to minimize the service provisioning costs by keeping a lower number of faster VMs, while maintaining target system utilization. Our evaluation results on traces collected from in-production systems show that the proposed policy achieves significant cost savings and low response times. © 2012 IEEE.