Graph-Based Data Relevance Estimation for Large Storage Systems
In storage systems, the relevance of files to users can be taken into account to determine storage control policies to reduce cost, while retaining high reliability and performance. The relevance of a file can be estimated by applying supervised learning and using the metadata as features. However, supervised learning requires many training samples to achieve an acceptable estimation accuracy. In this paper, we propose a novel graph-based learning system for the relevance estimation of files using a small training set. First, files are grouped into different file-sets based on the available metadata. Then a parameterized similarity metric among files is introduced for each file-set using the knowledge of the metadata. Finally, message passing over a bipartite graph is applied for relevance estimation. The proposed system is tested on various datasets and compared with logistic regression.