A Robust Video Scene Extraction Approach to Movie Content Abstraction
Abstract
This research addresses the problem of automatically extracting semantic video scenes from feature films based on multi-modal information. A three-stage scene detection scheme is proposed. First, we use pure visual information to extract a coarse-level scene structure based on generated shot sinks. Second, audio cue is integrated to refine the scene detection results by considering various kinds of audiovisual scenarios. Finally, we introduce users into this process by allowing them to interactively tune the final results to their own satisfaction. The generated scene structure forms a compact yet meaningful abstraction of the video data, which can help facilitate the content access. Preliminary experiments on integrating multiple media cues for movie scene extraction have yielded encouraging results. © 2004 Wiley Periodicals, Inc.