An empirical analysis of similarity in virtual machine images

Jayaram Kallapalayam Radhakrishnan; Chunyi Peng; Zhe Zhang; Minkyong Kim; Han Chen; Hui Lei

doi:10.1145/2090181.2090187

Middleware 2011

Conference paper

01 Dec 2011

An empirical analysis of similarity in virtual machine images

View publication

Abstract

To efficiently design deduplication, caching and other management mechanisms for virtual machine (VM) images in Infrastructure as a Service (IaaS) clouds, it is essential to understand the level and pattern of similarity among VM images in real world IaaS environments. This paper empirically analyzes the similarity within and between 525 VM images from a production IaaS cloud. Besides presenting the overall level of content similarity, we have also discovered interesting insights on multiple factors affecting the similarity pattern, including the image creation time and the location in the image's address space. Moreover, we found that similarities between pairs of images exhibit high variance, and an image is very likely to be more similar to a small subset of images than all other images in the repository. Groups of data chunks often appear in the same image. These image and chunk "clusters" can help predict future data accesses, and therefore provide important hints to cache placement, eviction, and prefetching. © 2011 ACM.

Conference paper