TME: An knowledge-based information extraction system
Abstract
Information extraction is a form of shallow text processing that locates a specified set of relevant information in a natural-language document. In this paper, a system-Template Match Engine (TME) is developed to extract useful information from unlabelled texts. The main feature of this system is that it improves and refines the initial extraction pattern by the concept knowledge which is incrementally acquired from the corpus. The system first builds an initial pattern by utilizing domain knowledge. Then the initial pattern is used to extract information from electronic documents. This step produces some feedback words by enlarging and analyzing the extracted information. Next, this pattern is refined by the feedback words and concept knowledge related to them. Finally, the refined pattern is used to extract specified information from electronic documents. The experiment results show that TME system increases recall without loss of precision.