Virtual machine (VM) provisioning is one of the fundamental components in virtualization-based cloud offerings. Modeling and analytically understanding the provisioning process is critical for the deployment and management of large-scale cloud. Based on extensive experiments on an example cloud system, we propose a queueing model to capture the important features related to scalability for the provisioning process. Specifically, we characterize how the number of VMs that can be hosted in the system and the number of physical host servers should scale according to the arriving VM requests. Note that VM provisioning incurs large I/O activities on targeted hosts with each having limited I/O resource. The logical stages during provisioning, which execute possibly on one or more physical nodes, are modeled by a semi-open Jackson Network. The model provides insights on how the performance bottlenecks can hinder the cloud scalability. Using this model we address the system sizing issue by performing heavy-traffic analysis in the classic Halfin-Whitt regime, also known as Quality and Efficiency Driven (QED), which accommodates moderate to large size cloud environments. © 2012 ITC.