A representative-based framework for parsing and summarizing events in surveillance videos
This paper presents a novel representative-based framework for parsing and summarizing events in long surveillance videos. The proposed framework first extracts object blob sequences and utilizes them to represent events in a surveillance video. Then, a sequence filtering strategy is introduced which detects and eliminates noisy blob sequences based on their spatial and temporal characteristics. After clustering the blob sequences into different event types, we further introduce a representative-based model which integrates location, size, and appearance cues to select a representative blob sequence from each cluster, and creates a snapshot image for each representative blob sequence. Based on the blob-sequence clustering and representative-sequence selection results, two schemes are further proposed to summarize contents of the input surveillance video: (1) type-based scheme which shows snapshot images to users and creates a summary video for a specific event cluster according to user-selected snapshot image; (2) representative-based scheme which creates a summary video only with the extracted representative blob sequences. Experimental results show that our approach can create more effective and well-organized summarization results compared with the state-of-the-art methods.