Integrating life sciences data - With a little Garlic
Abstract
Vast amounts of life sciences data today reside in specialized data sources, with specialized query processing capabilities. Data from one source must often be combined with data from other sources to give users the information they desire. Database middleware systems such as Garlic allow users to combine data from multiple sources in a single query. Garlic provides the user with a virtual database to which they can pose arbitrarily complex queries, though the actual data needed to answer the query may be stored in several different sources, and those sources may not even possess all the functionality needed to answer such a query themselves. The Garlic technology, as incorporated in IBM's DB2 product, forms the basis of the DiscoveryLink service offering for the life sciences industry. We describe the DiscoveryLink offering, focusing on two key contributions of Garlic, the wrapper architecture and the query optimizer, and illustrate how it can be used to integrate life sciences data from heterogeneous data sources.