IBM research TRECVID-2008 video retrieval system

Apostol Natsev; Wei Jiang; Michele Merler; John R. Smith; Jelena Tešić; Lexing Xie; Rong Yan

TRECVID 2008

Conference paper

17 Nov 2008

IBM research TRECVID-2008 video retrieval system

Abstract

In this paper, we describe the IBM Research system for indexing, analysis, and retrieval of video as applied to the TREC-2008 video retrieval benchmark. This year, focus of the system improvement was on large-scale learning, cross-domain detection, and interactive search. A. High-level concept detection: 1. A_ibm.Baseline_5: Baseline runs with randomsubspace bagging; 2. A_ibm.BaseSSL_4: Fusion of baseline runs and principal component semi-supervised support vector machines(PCS3VM); 3. A_ibm.BaseSSLText_3: Fusion of A ibm.BaseSSL 4 and text search results; 4. C_ibm.CrossDomain_6: Learning on data from web domain; 5. C_ibm.BNet_2: Multi-concept learning with baseline, PCS3VM and web concepts; 6. C_ibm.BOR_1: Best overall runs by compiling the best models based on heldout performance for each concept. Overall, almost all the individual components can improve the mean average precision after fused with the baseline results. To summarize, we have the following observations from our evaluation results: 1) The baseline run using random-subspace bagging offers a reasonable starting performance with a more efficient learning process than standard SVMs; 2) By learning on both feature space and unlabeled data, PCS3VM is able to improve the MAP by 12% after combined with baseline runs; 3) The additional development data collected from the web domain are shown to be informative on a number of the concepts, although its average performance is not comparable with baseline yet; B. Interactive search: 1. I_A_2 IBM.SearchTypeA_2: Type-A interactive run with 20 semantic concepts targeted in the 2008 HLF task. 2. I_C_2_IBM.SearchTypeC_1: Type-C interactive run with 96 semantic concepts, trained on additional web data. Different system analytics such as clustering and visual near-duplicates have notably helped, especially in increasing recall. There were no significant different between the two interactive runs, which used the same system setup except for number of semantic concepts available.

Conference paper