About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2011
Conference paper
A language independent approach to audio search
Abstract
In this paper, we propose an approach towards audio search where no language specific resources are required. This approach is most useful in those scenarios where no training data exists to create an automatic speech recognition (ASR) system for a language, e.g. in the case of most regional languages or dialects. In this approach, a Multilayer perceptron (MLP) is trained for a language where the training data exists, e.g. English. This MLP estimates a sequence of probability vectors for an audio segment, which is referred to as the posteriorgram representation for that segment. Components of the probability vector are posterior probabilities of English phonemes at any given frame of speech. Template matching technique is then used to compare the query-posteriorgram against the content-posteriorgram over the searchable audio-content. We present experiments in this paper to show that, even for other language like Hindi, the probabilities obtained from the neural network trained on English provide a characteristic representation for a word. A dynamic time warping algorithm with appropriate modifications is applied and encouraging P@N performance of 46.24% for Hindi and 65.22% for English for the task of audio search is reported while using the same MLP trained using English data in both the cases. Copyright © 2011 ISCA.