Katherine Panciera, Reid Priedhorsky, et al.
CHI 2010
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Only model-based linear transforms are considered, since, for linear transforms, they subsume the appropriate feature-space transforms. The paper compares the two possible forms of model-based transforms: (i) unconstrained, where any combination of mean and variance transform may be used, and (ii) constrained, which requires the variance transform to have the same form as the mean transform. Re-estimation formulae for all appropriate cases of transform are given. This includes a new and efficient full variance transform and the extension of the constrained model-space transform from the simple diagonal case to the full or block-diagonal case. The constrained and unconstrained transforms are evaluated in terms of computational cost, recognition time efficiency, and use for speaker adaptive training. The recognition performance of the two model-space transforms on a large vocabulary speech recognition task using incremental adaptation is investigated. In addition, initial experiments using the constrained model-space transform for speaker adaptive training are detailed. © 1998 Academic Press Limited.
Katherine Panciera, Reid Priedhorsky, et al.
CHI 2010
Christian M. Garcia-Arellano, Sam S. Lightstone, et al.
IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews
Clare-Marie Karat, John Karat, et al.
CHI 2006
Umang Bhatt, Javier Antorán, et al.
AIES 2021