AUTOMATIC SPEECHREADING OF IMPAIRED SPEECH
Gerasimos Potamianos, Chalapathy Neti
AVSP 2001
We propose the use of a hierarchical, two-stage discriminant transformation for obtaining audio-visual features that improve automatic speech recognition. Linear discriminant analysis (LDA), followed by a maximum likelihood linear transform (MLLT) is first applied on MFCC based audio-only features, as well as on visual-only features, obtained by a discrete cosine transform of the video region of interest. Subsequently, a second stage of LDA and MLLT is applied on the concatenation of the resulting single modality features. The obtained audio-visual features are used to train a traditional HMM based speech recognizer. Experiments on the IBM ViaVoice™ audio-visual database demonstrate that the proposed feature fusion method improves speaker-independent, large vocabulary, continuous speech recognition for both clean and noisy audio conditions considered. A 24% relative word error rate reduction over an audio-only system is achieved in the latter case.
Gerasimos Potamianos, Chalapathy Neti
AVSP 2001
Guillaume Gravier, Scott Axelrod, et al.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Jintao Jiang, Gerasimos Potamianos, et al.
ICASSP 2004
Shahram Ebadollahi, Jimeng Sun, et al.
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium