Publication
ICSLP 2000
Conference paper

Statistical methods for topic segmentation

Abstract

Automatic Topic Segmentation is an important technology for multimedia archival and retrieval systems. In this paper we present an algorithm for topic segmentation which uses a combination of machine learning, statistical natural language processing, and information retrieval techniques. The performance of this algorithm is measured by considering the misses and false alarms on a manually segmented corpus. We present our results on the widely used TDT2 and TDT3 corpora provided by NIST. Most of the techniques described are independent of the source language. We demonstrate this by applying the algorithm on both the English and Mandarin TDT3 corpora with only minor changes.

Date

16 Oct 2000

Publication

ICSLP 2000