About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2020
Conference paper
Implicit transfer of privileged acoustic information in a generalized knowledge distillation framework
Abstract
This paper proposes a novel generalized knowledge distillation framework, with an implicit transfer of privileged information. In our proposed framework, teacher networks are trained with two input branches on pairs of time-synchronous lossless and lossy acoustic features. While one branch of the teacher network processes a privileged view of the data using lossless features, the second branch models a student view, by processing lossy features corresponding to the same data. During the training step, weights of this teacher network are updated using a composite two-part cross entropy loss. The first part of this loss is computed between the predicted output labels of the lossless data and the actual ground truth. The second part of the loss is computed between the predicted output labels of the lossy data and lossless data. In the next step of generating soft labels, only the student view branch of the teacher is used with lossy data. The benefit of this proposed technique is shown on speech signals with long-term time-frequency bandwidth loss due to recording devices and network conditions. Compared to conventional generalized knowledge distillation with privileged information, the proposed method has a relative improvement of 9.5% on both lossless and lossy test sets.