Mutual relevance feedback for multimodal query formulation in video retrieval
Abstract
Video indexing and retrieval systems allow users to find relevant video segments for a given information need. A multimodal video index may include speech indices, a text-from-screen (OCR) index, semantic visual concepts, content-based image features, audio features and more. Formulating an efficient multimodal query for a given information need is much less intuitive and more challenging for the user than of composing a text query in document search. This paper describes a video retrieval system that uses mutual relevance feedback for multimodal query formulation. Through an iterative search and browse session, the user provides relevance feedback on system's output and the system provides the user a mutual feedback which leads to better query and better retrieval results. Official evaluation at the NIST TRECVID 2004 Search Task is provided for both Manual and Interactive search. It is shown that in the Manual task the queries result from the mutual feedback on the training data significantly improve the retrieval performances. A further improvement over the manual search is achieved in the interactive task by using both browsing and mutual feedback on the test set.