Enabling enterprise mashups over unstructured text feeds with Infosphere Mashuphub and SystemT

David E. Simmen; Frederick Reiss; Yunyao Li; Suresh Thalamati

doi:10.1145/1559845.1559999

SIGMOD/PODS 2009

Conference paper

04 Dec 2009

Enabling enterprise mashups over unstructured text feeds with Infosphere Mashuphub and SystemT

View publication

Abstract

Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations. Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.

Conference paper