ON VARIABLE SAMPLING FREQUENCIES IN SPEECH RECOGNITION

Fu-Hua Liu; Michael Picheny

ICSLP 1998

Conference paper

30 Nov 1998

ON VARIABLE SAMPLING FREQUENCIES IN SPEECH RECOGNITION

Abstract

In this paper we describe a novel approach to address the issue of different sampling frequencies in speech recognition. In general, when a recognition task needs a different sampling frequency from that of the reference system, it is customary to re-train the system for the new sampling rate. To circumvent the tedious training process, we propose a new approach termed Sampling Rate Transformation (SRT) to perform the transformation directly on speech recognition system. By re-scaling the mel-filter design and filtering the system in spectrum domain, SRT converts the existing system to the target spectral range. New systems are obtained without using any data from the test environment. Preliminary experiments show that SRT reduces the word error rate from 29.89% to 18.17% given 11KHz test data and a 16KHz SI system. The matched system for 11KHz has an error rate of 16.17%. We also examine MLLR and MAP. The best result from MLLR is 17.92% with 4.5 hours of speech. In the speaker adaptation mode, SRT reduces the error rate from 15.48% to 9.71% given 11KHz test data and a 16KHz SA system while the matched 11KHz SA system has an error rate of 9.33%.

Conference paper