The growth in the adoption of cloud computing is driven by distinct and clear benefits for both cloud customers and cloud providers. However, the increase in the number of cloud providers as well as in the variety of offerings from each provider has made it harder for customers to choose. At the same time, the number of options to build a cloud infrastructure, from cloud management platforms to different interconnection and storage technologies, also poses a challenge for cloud providers. In this context, cloud experiments are as necessary as they are labor intensive. CloudBench  is an open-source framework that automates cloud-scale evaluation and benchmarking through the running of controlled experiments, where complex applications are automatically deployed. Experiments are described through experiment plans, containing directives with enough descriptive power to make the experiment descriptions brief while allowing for customizable multi-parameter variation. Experiments can be executed in multiple clouds using a single interface. CloudBench is capable of managing experiments spread across multiple regions and for long periods of time. The modular approach adopted allows it to be easily extended to accommodate new cloud infrastructure APIs and benchmark applications, directly by external users. A built-in data collection system collects, aggregates and stores metrics for cloud management activities (such as VM provisioning and VM image capture) and application runtime information. Experiments can be conducted in a highly controllable fashion, in order to assess the stability, scalability and reliability of multiple cloud configurations. We demonstrate CloudBench's main characteristics through the evaluation of an OpenStack installation, including experiments with approximately 1200 simultaneous VMs at an arrival rate of up to 400 VMs/hour. © 2013 IEEE.