EPAComp: An Architectural Model for EPA Composition
Luís Henrique Neves Villaça, Sean Wolfgand Matsui Siqueira, et al.
SBSI 2023
Large vocabulary speaker-dependent speech recognition systems adjust to the acoustic peculiarities of each new speaker based on some enrollment data provided by this speaker. As the amount of data required increases with the sophistication of the underlying acoustic models, the enrollment may get lengthy. To streamline it, it is therefore desirable to make use of previously acquired speech data. We describe a data augmentation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired data of the reference speaker onto the space of the new speaker. The performance of the resulting procedure, dubbed the metamorphic algorithm, is illustrated on an isolated utterance speech recognition task with a vocabulary of 20 000 words. Results show that the metamorphic algorithm can substantially reduce the word error rate when only a limited amount of enrollment data is available. Alternatively, it leads to a level of performance comparable to that obtained when a much greater amount of enrollment data is required from the new speaker. In addition, it can also be used for tracking spectral evolution over time, thus providing a possible means for robust speaker self-adaptation. © 1994 IEEE
Luís Henrique Neves Villaça, Sean Wolfgand Matsui Siqueira, et al.
SBSI 2023
Arnon Amir, M. Lindenbaum
Computer Vision and Image Understanding
Arthur Nádas
American Journal of Mathematical and Management Sciences
Eli Packer, Asaf Tzadok, et al.
ICDAR 2011