Morpheme-based feature-rich language models using Deep Neural Networks for LVCSR of Egyptian Arabic
Abstract
Egyptian Arabic (EA) is a colloquial version of Arabic. It is a low-resource morphologically rich language that causes problems in Large Vocabulary Continuous Speech Recognition (LVCSR). Building LMs on morpheme level is considered a better choice to achieve higher lexical coverage and better LM probabilities. Another approach is to utilize information from additional features such as morphological tags. On the other hand, LMs based on Neural Networks (NNs) with a single hidden layer have shown superiority over the conventional n-gram LMs. Recently, Deep Neural Networks (DNNs) with multiple hidden layers have achieved better performance in various tasks. In this paper, we explore the use of feature-rich DNN-LMs, where the inputs to the network are a mixture of words and morphemes along with their features. Significant Word Error Rate (WER) reductions are achieved compared to the traditional word-based LMs. © 2013 IEEE.