Direct neuron-wise fusion of cognate neural networks
Abstract
This paper proposes a method to create a robust acoustic model by directly fusing multiple neural networks that have dissimilar characteristics without any additional layers/nodes involving retraining procedures. The fused neural networks derive from a shared parent neural network and are referred to as cognate (child) neural networks in this paper. The neural networks are fused by interpolating weight and bias parameters associated with each neuron with a different fusion weight, assuming that cognate neural networks to be fused have the same topology. Therefore, no extra computational cost during decoding is required. The fusion weight is determined by considering a cosine similarity estimated from parameters connecting to the neuron and the fusion is performed for every neuron. Experiments were carried out using a test suite consisting of various acoustic conditions with a wide SNR range, speakers including foreign accented speakers, and speaking styles. From the experiments, the network created by fusing cognate neural networks showed consistent improvement on average compared with the commercial-grade domain-free network originating from the parent model. In addition, we demonstrate that the fusion considering input connections to the neuron achieves the highest accuracy in our experiments.