About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICDM 2016
Conference paper
Efficient sampling-based kernel mean matching
Abstract
Many real-world applications exhibit scenarios where distributions represented by training and test data are not similar, but related by a covariate shift, i.e., having equal class conditional distribution with unequal covariate distribution. Traditional data mining techniques suffer to learn a good predictive model in the presence of covariate shift. Recent studies have proposed approaches to address this challenge by weighing training instances based on density ratio between test and training data distributions. Kernel Mean Matching (KMM) is a well known method for estimating density ratio, but has time complexity cubic in the size of training data. Therefore, KMM is not suitable in real-world applications, especially in cases where the predictive model needs to be updated periodically with large training data. We address this challenge by taking fixed-size samples from training and test data, performing independent computations on these samples, and combining the results to obtain overall density ratio estimates. Our empirical evaluation demonstrates a large gain in execution time, while also achieving competitive accuracy on numerous benchmark datasets.