Speech segmentation and spoken document processing

Mari Ostendorf; Benoit Favre; Ralph Grishman; Dilek Hakkani-Tür; Mary Harper; Dustin Hillard; Julia Hirschberg; Heng Ji; Jeremy G. Kahn; Yang Liu; Sameer Maskey; Evgeny Matusov; Hermann Ney; Andrew Rosenberg; Elizabeth Shriberg; Wen Wang; Chuck Wooters

doi:10.1109/MSP.2008.918023

IEEE SPM

Paper

01 Jan 2008

Speech segmentation and spoken document processing

View publication

Abstract

The concept of speech segmentation has many levels and are useful for improving automatic speech recognition (ASR) technology. There has been progress as well in the field of sentence segmentation with the combination of lexical information from a word recognizer, with spectral and prosodic cues. In addition, sentence segmentation is relevant for speech understanding applications especially for parsing and information extraction (IE) as well as machine translation, summarization and question answering at the application level. For a segmentation algorithm to work, audio diarization and structural segmentation are used. Audio diarization's goal is to segment an audio recording into acoustically homogeneous regions, given only features extracted from the audio signal. Another is speaker diarization, which involves computing a generalized log likelihood ratio at candidate boundaries. Structural segmentation has the goal of detecting boundary events and whole constituent modeling, and both are applicable for speech recognition because they exploit the alignment between words.

Conference paper