Heredity and environment in speech recognition: The role of a priori information vs. data
Abstract
Most significant advances in speech recognition over the last thirty years can be attributed to the easy availability of everincreasing corpora of speech and language data and the development of simple trainable parametric statistical models that take advantage of this data. Hidden Markov Models, n-gram language models, and linear-discriminant based feature extraction are all examples of such data-driven algorithms. However, there is a general feeling in the recognition community that there is a large untapped body of knowledge encompassing a priori sources of information in speech and language that can be mined to serve as the basis for the next generation of improvements in speech recognition systems. Such sources of information include constraints imposed by articulatory models, the grammatical structure of language, and phonology. This paper reviews previous abortive attempts to utilize a priori information in speech recognition and contrasts them with data-driven approaches that seem to more successfully capture information of a similar nature. It also highlights some recent attempts to incorporate explicit sources of speech and language knowledge and speculates on possibilities for synergy between the two approaches in the future.