Publication
MM 2009
Workshop paper

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

View publication

Abstract

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared with traditional modeling approaches, RB-SBag offers a considerably faster learning process while minimizing the risk of overfitting. Its ensemble structure also enables a convenient transformation into a simple parallel framework called MapReduce. To further improve scalability, we also develop a task scheduling algorithm to optimize task placement for heterogenous tasks. On a collection consisting of more than 250,000 images and several standard TRECVID benchmark datasets, RB-SBag achieved more than a 10-fold speedup with comparable or even better classification performance than baseline SVMs. We also deployed the MapReduce implementation on a 16-node Hadoop cluster, where the proposed task scheduler demonstrates a significantly better scalability than the baseline scheduler in the presence of task heterogeneity. Copyright 2009 ACM.

Date

Publication

MM 2009

Share