IBM J. Res. Dev

Trends and advances in speech recognition

View publication


One of the earliest successful applications of machine-learning techniques to pattern recognition was the application of information-theoretic principles to speech recognition. Previous approaches relied heavily on expert input through the painstaking analysis of data to relate speech signals to the word sequences that produced them. Such methodologies were completely displaced by casting the speech recognition problem in a probabilistic framework by modeling the joint probability distribution of speech signals and word sequences. At the beginning of the 21st century, the amount of data and computation to train and build models has increased exponentially, and the emergence of new machine-learning algorithms and methodologies has opened new vistas in approaching complex pattern recognition problems. This is enabled by a new set of machine-learning techniques referred to as graphical models, with computationally tractable training algorithms. Closely related are neural-network modeling techniques, and there has been a resurgence of interest in the application of neural-network concepts, such as deep networks to speech recognition. The explosion of data has caused the development of new ways to capture the key features in massive amounts of data using efficient methods deploying exemplar-based sparse representations. Lastly, all of these different approaches can be tied together in a principled fashion using another variation of graphical models: an exponential model framework. This paper describes the current state of the art in speech recognition systems and highlights the developments that are expected to produce major breakthroughs in our ability to automatically recognize speech using computers. © 2011 IBM.