Publication
NAACL-HLT 2003
Conference paper

TIPS: A translingual information processing system

Abstract

Searching online information is increasingly a daily activity for many people. The multilinguality of online content is also increasing (e.g. the proportion of English web users, which has been decreasing as a fraction the increasing population of web users, dipped below 50% in the summer of 2001). To improve the ability of an English speaker to search mutlilingual content, we built a system that supports cross-lingual search of an Arabic newswire collection and provides on demand translation of Arabic web pages into English. The cross-lingual search engine supports a fast search capability (sub-second response for typical queries) and achieves state-of-the-art performance in the high precision region of the result list. The on demand statistical machine translation uses the Direct Translation model along with a novel statistical Arabic Morphological Analyzer to yield state-of-the-art translation quality. The on demand SMT uses an efficient dynamic programming decoder that achieves reasonable speed for translating web documents.