The SystemT IDE: An integrated development environment for information extraction rules

Laura Chiticariu; Vivian Chu; Sajib Dasgupta; Thilo W. Goetz; Howard Ho; Rajasekar Krishnamurthy; Alexander Lang; Yunyao Li; Bin Liu; Sriram Raghavan; Frederick R. Reiss; Shivakumar Vaithyanathan; Huaiyu Zhu

doi:10.1145/1989323.1989479

SIGMOD 2011

Conference paper

12 Jun 2011

The SystemT IDE: An integrated development environment for information extraction rules

View publication

Abstract

Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples. © 2011 Authors.

Conference paper