About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 2004
Conference paper
Use of metadata to improve recognition of spontaneous speech and named entities
Abstract
With improved recognition accuracies for LVCSR tasks, it has become possible to search large collections of spontaneous speech for a variety of information. The MALACH corpus of Holocaust testimonials is one such collection, in which we are interested in automatically transcribing and retrieving portions that are relevant to named entities such as people, places, and organizations. Since the testimonials were gathered from thousands of people in countries throughout Europe, an extremely large number of potential named entities are possible, and this causes a well-known dilemma: increasing the size of the vocabulary allows for more of these words to be recognized, but also increases confusability, and can harm recognition performance. However, the MALACH corpus, like many other collections, includes side information or metadata that can be exploited to provide prior information on exactly which named entities are likely to appear. This paper proposes a method that capitalizes on this prior information to reduce named-entity recognition errors by over 50% relative, and simultaneously decrease the overall word error rate by 7% relative. The metadata we use derives from a pre-interview questionaire that includes the names of friends, relatives, places visited, membership of organizations, synonyms of place names, and similar information. By augmenting the lexicon and language model with this information on a speaker-by-speaker basis, we are able to exploit the textual information that is already available in the corpus to facilitate much improved speech recognition.