On the Effectiveness of Poisoning against Unsupervised Domain Adaptation
Data poisoning attacks manipulate victim's training data to compromise their model performance, after training. Previous works on poisoning have shown the inability of a small amount of poisoned data at significantly reducing the test accuracy of deep neural networks. In this work, we propose an upper bound on the test error induced by additive poisoning, which explains the difficulty of poisoning against deep neural networks. However, the limited effect of poisoning is restricted to the setting where training and test data are from the same distribution. To demonstrate this, we study the effect of poisoning in an unsupervised domain adaptation (UDA) setting where the source and the target domain distributions are different. We propose novel data poisoning attacks that prevent UDA methods from learning a representation that generalizes well on the target domain. Our poisoning attacks significantly lower the target domain accuracy of state-of-the-art UDA methods on popular benchmark UDA tasks, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data. The effectiveness of our attacks in the UDA setting highlights the seriousness of the threat posed by data poisoning and the importance of data curation in machine learning.