Publication
ICME 2002
Conference paper

Semantic indexing of multimedia using audio, text and visual cues

View publication

Abstract

We describe methods for automatic labeling of high-level semantic concepts in documentary style videos. The emphasis of this paper is on audio processing and on fusing information from multiple modalities. The work described represents initial work towards a trainable system that acquires a collection of generic "intermediate" semantic concepts across modalities (such as audio, video, text) and combines information from these modalities for automatic labeling of a "high-level" concept. Initial results suggest that multi-modal fusion achieves a 12.5% relative improvement over the best unimodal model.

Date

Publication

ICME 2002

Authors

Share