Data-free Model Fusion with Generator Assistants
Abstract
In a Model Marketplace, multiple parties may submit pretrained neural networks that accomplish similar tasks. These networks usually have different architectures and are trained on different datasets. It would be significantly beneficial to fuse all these models into a single model with superior performance and lower cost of execution. However, the training parties may be unwilling to share their training data, due to data privacy and confidentiality, and may not agree to participate in a federated learning collaboration. As an alternative, we propose a data-free model fusion framework, based on knowledge distillation, to combine several pretrained models into a superior model without the need for the raw training data. We employ a generative approach to synthesize data for knowledge distillation. The data generator needs to be trained to produce a diverse set of samples that have a similar distribution to that of the training data. Generating samples that cause student-teacher disagreement can expand the coverage of the data distribution, reduce chance of mode collapse and, improve data-free knowledge distillation. However, we found that in a multi-teacher setting, encouraging disagreements between the teachers and the student causes confusion for the generators and deteriorates the results. To tackle this, we introduce Generator Assistants (GA), which keep the generators evolving without causing confusion. Experiments on CIFAR-10, CIFAR-100 and Stanford Dogs datasets show that our method greatly improves the data-free model fusion performance compared to the prior art.