About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
CIKM 2007
Conference paper
An experimental study of the impact of information extraction accuracy on semantic search performance
Abstract
Researchers have shown that various natural language processing techniques can be used in document analysis to impact search performance. For the most part, they examined how an analysis system with certain performance characteristics can be leveraged to improve document and/or passage search results. We have previously shown that semantic queries which utilize named entity and relation information extracted from the corpus can lead to significant improvement in search performance. In this paper, we extend our previous efforts and examine how search performance degrades in the face of imperfect named entity and relation extraction. Our study was carried out by developing gold standard annotated corpora and applying different error models to the gold standard annotations to simulate errors made by automatic recognizers. We identify automatic recognizer characteristics that make them more amenable to our search tasks, show that recognizer recall has more significant impact on semantic search performance than its precision, and demonstrate that significant improvement in both MAP and Exact Precision scores can be achieved by adopting automatic named entity and relation recognizers with near state-of-the-art performance. Copyright 2007 ACM.