Ensemble direct density ratio estimation for multistream classification
In traditional machine learning, it is assumed that training data conforming to the stationary distribution of test data is readily available. Yet, such an assumption is not valid in practice due to a high cost of obtaining the truth value of data instances. This is particularly true when computing over non-stationary data streams. Recent studies in the multistream setting aim to address this issue by leveraging a stream of data with biased labeled instances (called the source stream) to train a suitable model for prediction over unlabeled instances (called the target stream). They use sampling bias correction techniques as a preprocessing step for estimating source instance weights for training a bias-corrected classifier useful to predict label of target data instances. In this regard, a recent framework proposes to utilize a Gaussian kernel model to estimate source instance weights. Unfortunately, it suffers from large computational time complexity and consequently deteriorates the rate at which streaming data is processed. In this paper, we address this issue by proposing a divide-And-conquer method suitable for simultaneously evaluating source instance weights and detecting changes in distribution over time. Our empirical results demonstrate a significant gain in execution time, compared to the previous approach, while achieving similar or better classification accuracy on real-world datasets.