About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
In this paper, we propose Time-Marked Word (TMW) lists as a replacement for the lattices and Confusion Networks (CNs) widely used as indexing vehicles for Spoken Term Detection (STD). In a TMWlist, candidates are simply tagged with posterior probabilities and time information and stored as a large list of words: the additional ordering present in a lattice or CN is discarded. TMW lists compactly summarize a large ASR search space. Representing a large search space is critical for STD metrics such as ATWV that heavily penalize misses of rare keywords. Comparisons on the OpenKWS 2014 Tamil limited language pack task [1] show that the new TMW-based indexing results in better performance while being faster and having a smaller footprint.