About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
We propose to estimate the KL divergence using a relaxedlikelihood ratio estimation in a Reproducing Kernel Hilbertspace. We show that the dual of our ratio estimator for KLin the particular case of Mutual Information estimation cor-responds to a lower bound on the MI that is related to the socalled Donsker Varadhan lower bound. In this dual form, MIis estimated via learning a witness function discriminatingbetween the joint density and the product of marginal, as wellas an auxiliary scalar variable that enforces a normalizationconstraint on the likelihood ratio. By extending the functionspace to neural networks, we propose an efficient neural MIestimator, and validate its performance on synthetic examples,showing advantage over the existing baselines. We demon-strate its strength in large-scale self-supervised representationlearning through MI maximization