Andrew Geng, Pin-Yu Chen
IEEE SaTML 2024
Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F1-score, compared with the state-of-the-art method.
Andrew Geng, Pin-Yu Chen
IEEE SaTML 2024
Zhiyuan He, Yijun Yang, et al.
ICML 2024
Gaoyuan Zhang, Songtao Lu, et al.
UAI 2022
Heshan Fernando, Lisha Chen, et al.
ICASSP 2024