Publication
ACL 2017
Conference paper

Information-Theory interpretation of the skip-Gram negative-Sampling objective function

View publication

Abstract

In this paper, we define a measure of dependency between two random variables, based on the Jensen-Shannon (JS) divergence between their joint distribution and the product of their marginal distributions. Then, we show that word2vec’s skip-gram with negative sampling embedding algorithm finds the optimal low-dimensional approximation of this JS dependency measure between the words and their contexts. The gap between the optimal score and the low-dimensional approximation is demonstrated on a standard text corpus.

Date

Publication

ACL 2017

Authors

Share