About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2014
Conference paper
Kernel methods match deep neural networks on TIMIT
Abstract
Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirements and computational expense. In recent years, randomized approximate feature maps have emerged as an elegant mechanism to scale-up kernel methods. Still, in practice, a large number of random features is required to obtain acceptable accuracy in predictive tasks. In this paper, we develop two algorithmic schemes to address this computational bottleneck in the context of kernel ridge regression. The first scheme is a specialized distributed block coordinate descent procedure that avoids the explicit materialization of the feature space data matrix, while the second scheme gains efficiency by combining multiple weak random feature models in an ensemble learning framework. We demonstrate that these schemes enable kernel methods to match the performance of state of the art Deep Neural Networks on TIMIT for speech recognition and classification tasks. In particular, we obtain the best classification error rates reported on TIMIT using kernel methods. © 2014 IEEE.