A novel part of speech tagging framework for NLP based business process management
Natural Language Processing (NLP) is a key technique to automate Business Process Management (BPM) at different levels. The performance of existing NLP based BPM methods suffer from the limited accuracy of Part of Speech (POS) tagging, which is a key step in NLP pipelines. Note that the performance of POS tagging highly depends on the domain of annotated training data. However, most state-of-the-art POS taggers are trained from corpus in newswire domain which usually have different syntax features with business process description (BPD). The syntax features of BPD domain include usually starting with an imperative verb and containing numerous out-of-vocabulary (OOV) words. In this paper, we propose a novel POS tagging framework to tackle these problems. The main idea is that syntax feature of starting with imperative verb could be studied by enhancing the proportion of correctly POS-annotated imperative sentences in the training data. The trained POS tagger could reduce the overall POS tagging error by nearly 12% compared with newswire trained POS tagger. For verbs which are key words in BPD, the tagging precision could be increased by 27%. The lexical ambiguity caused by OOV words is solved by extracting local contextual knowledge out of images which are attached to help users understand the process better. Experimental results show that the overall POS tagging accuracy could be increased by nearly 10% with contextual OOV knowledge.