Structured data and inference in DeepQA

Aditya Kalyanpur; B.K. Boguraev; S. Patwardhan; J.W. Murdock; A. Lally; C. Welty; John Prager; B. Coppola; A. Fokoue-Nkoutche; L. Zhang; Y. Pan; Z.M. Qiu

doi:10.1147/JRD.2012.2188737

IBM J. Res. Dev

Review

01 May 2012

Structured data and inference in DeepQA

View publication

Abstract

Although the majority of evidence analysis in DeepQA is focused on unstructured information (e.g., natural-language documents), several components in the DeepQA system use structured data (e.g., databases, knowledge bases, and ontologies) to generate potential candidate answers or find additional evidence. Structured data analytics are a natural complement to unstructured methods in that they typically cover a narrower range of questions but are more precise within that range. Moreover, structured data that has formal semantics is amenable to logical reasoning techniques that can be used to provide implicit evidence. The DeepQA system does not contain a single monolithic structured data module; instead, it allows for different components to use and integrate structured and semistructured data, with varying degrees of expressivity and formal specificity. This paper is a survey of DeepQA components that use structured data. Areas in which evidence from structured sources has the most impact include typing of answers, application of geospatial and temporal constraints, and the use of formally encoded a priori knowledge of commonly appearing entity types such as countries and U.S. presidents. We present details of appropriate components and demonstrate their end-to-end impact on the IBM Watson™ system. © 1957-2012 IBM.

Conference paper