Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic

Amr El-Desoky Mousa; Hong-Kwang Jeff Kuo; Lidia Mangu; Hagen Soltau

doi:10.1109/ICASSP.2013.6639311

ICASSP 2013

Conference paper

18 Oct 2013

Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic

View publication

Abstract

Egyptian Arabic (EA) is a colloquial version of Arabic. It is a low-resource morphologically rich language that causes problems in Large Vocabulary Continuous Speech Recognition (LVCSR). Building LMs on morpheme level is considered a better choice to achieve higher lexical coverage and better LM probabilities. Another approach is to utilize information from additional features such as morphological tags. On the other hand, LMs based on Neural Networks (NNs) with a single hidden layer have shown superiority over the conventional n-gram LMs. Recently, Deep Neural Networks (DNNs) with multiple hidden layers have achieved better performance in various tasks. In this paper, we explore the use of feature-rich DNN-LMs, where the inputs to the network are a mixture of words and morphemes along with their features. Significant Word Error Rate (WER) reductions are achieved compared to the traditional word-based LMs. © 2013 IEEE.

Conference paper