IBM research and Columbia University TRECVID-2011 multimedia event detection (MED) system

Liangliang Cao; Shih-Fu Chang; Noel Codella; Courtenay Cotton; Dan Ellis; Leiguang Gong; Matthew Hill; Gang Hua; John Kender; Michele Merler; Yadong Mu; Apostol Natsev; John R. Smith

TRECVID 2011

Conference paper

05 Dec 2011

IBM research and Columbia University TRECVID-2011 multimedia event detection (MED) system

Abstract

The IBM Research/Columbia team investigated a novel range of low-level and high-level features and their combination for the TRECVID Multimedia Event Detection (MED) task. We submitted four runs exploring various methods of extraction, modeling and fusing of low-level features and hundreds of high-level semantic concepts. Our Run 1 developed event detection models utilizing Support Vector Machines (SVMs) trained from a large number of low-level features and was interesting in establishing the baseline performance for visual features from static video frames. Run 2 trained SVMs from classification scores generated by 780 visual, 113 action and 56 audio high-level semantic classifiers and explored various temporal aggregation techniques. Run 2 was interesting in assessing performance based on different kinds of high-level semantic information. Run 3 fused the lowand high-level feature information and was interesting in providing insight into the complementarity of this information for detecting events. Run 4 fused all of these methods and explored a novel Scene Alignment Model (SAM) algorithm that utilized temporal information discretized by scene changes in the video.

Conference paper