Service-oriented systems, consisting of atomic services and their compositions hosted in service composition execution engines (CEEs), are commonly deployed to deliver web applications. As the workloads of applications fluctuate over time, it is economical to autonomously and dynamically adjust system capacity, i.e., the number of replicas for atomic services and CEEs. In this paper, we propose a novel replica provisioning policy, Resos, which adjusts the number of CEE and service replicas periodically based on the predicted workloads such that all replicas are well utilized at the target values. In particular, Resos models the workload balance and dependency between CEE and service replicas by estimating the probability that threads of CEE replicas are not blocked by I/O. Moreover, we derive the analytical bounds of CEE effective utilization and illustrate the cause of low nominal utilization at CEE replicas. We evaluate Resos on a simulated service-oriented system, which hosts CEE and service replicas on multi-threaded servers. The evaluated workload is derived from utilization traces collected from production systems. Through simulation, we demonstrate that Resos effectively reduces the number of required replicas while maintaining target utilization and lowering the response times of requests. © 2012 IEEE.