About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CBAIVL 2001
Conference paper
Audio-visual event detection using duration dependent input output Markov models
Abstract
Analysis of audio-visual data and detection of semantic events with spatio-temporal support is a challenging multimedia understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We introduce a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model non-exponential duration densities with the mapping of input sequences to output sequences. We test the DDIOMM by modelling the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance.