Adaptation of front end parameters in a speech recognizer
Abstract
In this paper we consider the problem of adapting parameters of the algorithm used for extraction of features. Typical speech recognition systems use a sequence of modules to extract features which are then used for recognition. We present a method to adapt the parameters in these modules under a variety of criteria, e.g maximum likelihood, maximum mutual information. This method works under the assumption that the functions that the modules implement are differentiable with respect to their inputs and parameters. We use this framework to optimize a linear transform preceding the linear discriminant analysis (LDA) matrix and show that it gives significantly better performance than a linear transform after the LDA matrix with small amounts of data. We show that linear transforms can be estimated by directly optimizing likelihood or the MMI objective without using auxiliary functions. We also apply the method to optimize the Mel bins, and the compression power in a system that uses power law compression.