About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
Schema matching is at the heart of integrating structured and semi-structured data with applications in data warehousing, data analysis recommendations, Web table matching, etc. Schema matching is known as an uncertain process and a common method to overcome this uncertainty introduces a human expert with a ranked list of possible schema matches to choose from, known as top-K matching. In this work we propose a learning algorithm that utilizes an innovative set of features to rerank a list of schema matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches with a desired level of confidence in finding the best match. We also propose the use of matching predictors as features in a learning task, and tailored nine new matching predictors for this purpose. The proposed algorithm assists the matching process by introducing a quality set of alternative matches to a human expert. It also serves as a step towards eliminating the involvement of human experts as decision makers in a matching process altogether. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.