Knowledge graph and corpus driven segmentation and answer inference for telegraphic entity-seeking queries

Mandar Joshi; Uma Sawant; Soumen Chakrabarti

doi:10.3115/v1/d14-1117

EMNLP 2014

Conference paper

25 Oct 2014

Knowledge graph and corpus driven segmentation and answer inference for telegraphic entity-seeking queries

View publication

Abstract

Much recent work focuses on formal interpretation of natural question utterances, with the goal of executing the resulting structured queries on knowledge graphs (KGs) such as Freebase. Here we address two limitations of this approach when applied to open-domain, entity-orientedWeb queries. First,Web queries are rarely wellformed questions. They are "telegraphic", with missing verbs, prepositions, clauses, case and phrase clues. Second, the KG is always incomplete, unable to directly answer many queries. We propose a novel technique to segment a telegraphic query and assign a coarse-grained purpose to each segment: A base entity e1, a relation type r, a target entity type t2, and contextual words s. The query seeks entity e2 ε t2 where r(e1, e2) holds, further evidenced by schema-agnostic words s. Query segmentation is integrated with the KG and an unstructured corpus where mentions of entities have been linked to the KG. We do not trust the best or any specific query segmentation. Instead, evidence in favor of candidate e2s are aggregated across several segmentations. Extensive experiments on the ClueWeb corpus and parts of Freebase as our KG, using over a thousand telegraphic queries adapted from TREC, INEX, and Web- Questions, show the efficacy of our approach. For one benchmark, MAP improves from 0.2-0.29 (competitive baselines) to 0.42 (our system). NDCG@10 improves from 0.29-0.36 to 0.54.

Conference paper