About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IJDAR
Paper
Sentence boundary detection in conversational speech transcripts using noisily labeled examples
Abstract
This paper presents a technique for adding sentence boundaries to text obtained by Automatic Speech Recognition (ASR) of conversational speech audio. We show that starting with imprecise boundary information, added using only silence information from an ASR system, we can improve boundary detection using Head and Tail phrases. We develop our technique and show its effectiveness on two manually transcribed and one automatically transcribed corpus. The main purpose of adding sentence boundaries to ASR transcripts is to improve linguistic analysis, namely information extraction, for text mining systems that handle huge volumes of textual data and analyze trends and features of the concepts. Hence, we also show how the addition of boundaries improves two basic natural language processing tasks - PoS label assignment and adjective-noun extraction. © Springer-Verlag 2007.