About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
VLDB 2006
Conference paper
Efficiently Linking Text Documents with Relevant Structured Information
Abstract
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of interlinking critical business information distributed across structured and unstructured data sources. We present a novel system, called EROCS, for linking a given text document with relevant structured data. EROCS views the structured data as a predefined set of “entities” and identifies the entities that best match the given document. EROCS also embeds the identified entities in the document, effectively creating links between the structured data and segments within the document. Unlike prior approaches, EROCS identifies such links even when the relevant entity is not explicitly mentioned in the document. EROCS uses an efficient algorithm that performs this task keeping the amount of information retrieved from the database at a minimum. Our evaluation shows that EROCS achieves high accuracy with reasonable overheads.