About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2007
Conference paper
Detection, diarization, and transcription of far-field lecture speech
Abstract
Speech processing of lectures recorded inside smart rooms has recently attracted much interest. In particular, the topic has been central to the Rich Transcription (RT) Meeting Recognition Evaluation campaign series, sponsored by NIST, with emphasis placed on benchmarking speech activity detection (SAD), speaker diarization (SPKR), speech-to-text (STT), and speaker-attributed STT (SASTT) technologies. In this paper, we present the IBM systems developed to address these tasks in preparation for the RT 2007 evaluation, focusing on the far-field condition of lecture data collected as part of European project CHIL. For their development, the systems are benchmarked on a subset of the RT Spring 2006 (RT06s) evaluation test set, where they yield significant improvements for all SAD, SPKR, and STT tasks over RT06s results; for example, a 16% relative reduction in word error rate is reported in STT, attributed to a number of system advances discussed here. Initial results are also presented on SASTT, a task new y introduced in 2007 in place of the discontinued SAD.