About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
K-CAP 2019
Conference paper
Identifying ambiguity in semantic resources
Abstract
In many Information Extraction tasks, dictionaries and lexica are powerful building blocks for sophisticated extractions. The success of the Semantic Web in the last 10 years has produced an unprecedented quantity of available structured data that can be leveraged to produce dictionaries on countless concepts in many domains. While being an invaluable resource, these automatically built dictionaries may contain "problematic" items, such as spurious words, which have been included by mistake, or ambiguous words, which appear with multiple different meanings in the target corpus and therefore necessitating an expensive disambiguation task. In this paper, we propose a simple and effective method to identify problematic terms in a given dictionary, which are ambiguous or spurious with respect to a given corpus, with the aim to facilitate subsequent Information Extraction tasks. We prove the effectiveness of the method with a systematic experiment on publicly available concept dictionaries, using a very large Web corpus as target, with an average precision in identifying a problem term above 85%.