About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2020
Conference paper
Context and Uncertainty Modeling for Online Speaker Change Detection
Abstract
Speaker change detection is often addressed as a key component in speaker diarization systems. In this work we focus on online speaker change detection as a standalone task which is required for online closed captioning of broadcast television. Contrary to related works, we do not operate on frame-level features such as MFCC. Instead, we leverage state-of-the-art speaker recognition-based technology by modeling sequences of pretrained speaker embeddings (x-vectors) using a deep neural network. We explicitly address two types of uncertainties. The first one is uncertainty in embedding point estimate which is due to short and varying segment duration. The second type is uncertainty in which context segments are relevant to representing the speaker talking right before the hypothesized speaker change. We also show the robustness of affinity matrix-representation for speaker change detection. Our methods provide very significant accuracy improvements compared to several baselines including a recently published end-to-end system.