Order-free spoken term detection
In this paper, we propose Time-Marked Word (TMW) lists as a replacement for the lattices and Confusion Networks (CNs) widely used as indexing vehicles for Spoken Term Detection (STD). In a TMWlist, candidates are simply tagged with posterior probabilities and time information and stored as a large list of words: the additional ordering present in a lattice or CN is discarded. TMW lists compactly summarize a large ASR search space. Representing a large search space is critical for STD metrics such as ATWV that heavily penalize misses of rare keywords. Comparisons on the OpenKWS 2014 Tamil limited language pack task  show that the new TMW-based indexing results in better performance while being faster and having a smaller footprint.