About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICDM 2018
Conference paper
Heterogeneous Data Integration by Learning to Rerank Schema Matches
Abstract
Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in data warehousing, process matching, data analysis recommendations, Web table matching, etc. Schema matching is known to be an uncertain process and a common method of overcoming this uncertainty is by introducing a human expert with a ranked list of possible schema matches from which the expert may choose, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.