Multimedia surrogates for video gisting: Toward combining spoken words and imagery

Gary Marchionini; Yaxiao Song; Robert Farrell

doi:10.1016/j.ipm.2009.05.007

Inf. Process. Manage.

Paper

01 Nov 2009

Multimedia surrogates for video gisting: Toward combining spoken words and imagery

View publication

Abstract

Good surrogates that allow people to quickly derive the gist of videos without taking the time to view the full video are crucial to video retrieval and browsing systems. Although there are many kinds of textual and visual surrogates used in video retrieval systems, there are few audio surrogates in practice. To evaluate the effectiveness of audio surrogates alone and in combination with one kind of visual surrogate, fast forwards, a user study with 48 participants was conducted. The study investigated the effects of manually and automatically generated spoken keywords and spoken descriptions, using a text-to-speech synthesizer, on six specific video gisting tasks. Results demonstrate that manually generated spoken descriptions are better than both manually generated spoken keywords and fast forwards for video gisting. Both spoken keywords, whether manually or automatically generated, and fast forwards are better than automatically extracted descriptions. High quality spoken summaries were found very effective for video gisting. Combining fast forwards with either type of spoken text was not significantly better than any of the individual spoken surrogates; however, the visual elements added subjective value to the user experience. Adding spoken descriptions or keywords as surrogates to video retrieval and browsing systems is recommended. © 2009 Elsevier Ltd. All rights reserved.

Conference paper