Managing information extraction: State of the art and research directions

Anhai Doan; Raghu Ramakrishnan; Shivakumar Vaithyanathan

doi:10.1145/1142473.1142595

SIGMOD 2006

Conference paper

01 Dec 2006

Managing information extraction: State of the art and research directions

View publication

Abstract

This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning. Copyright 2006 ACM.

Paper