User-trainable Video Annotation Using Multimodal Cues

H.J. Nock; W.H. Adams; G. Iyengar

SIGIR Forum (ACM Special Interest Group on Information Retrieval)

Conference paper

01 Dec 2003

User-trainable Video Annotation Using Multimodal Cues

Abstract

This paper describes progress towards a general framework for incorporating multimodal cues into a trainable system for automatically annotating user-defined semantic concepts in broadcast video. Models of arbitrary concepts are constructed by building classifiers in a score space defined by a pre-deployed set of multimodal models. Results show annotation for user-defined concepts both in and outside the pre-deployed set is competitive with our best video-only models on the TREC Video 2002 corpus. An interesting side result shows speech-only models give performance comparable to our best video-only models for detecting visual concepts such as "outdoors", "face" and "cityscape".

Conference paper