About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.