About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
Pattern Recognition
Paper
Learning variable-length representation of words
Abstract
A standard word embedding algorithm, such as ‘word2vec’, embeds each word as a dense vector of a preset dimensionality, the components of which are learned by maximizing the likelihood of predicting the context around it. However, as an inherent linguistic phenomenon, it is evident that there is a varying degree of difficulty in identifying words from their contexts. This suggests that a variable granularity in word vector representation may be useful to obtain sparser and more compressed word representations, requiring less storage space. To that end, in this paper, we propose a word vector training algorithm that uses a variable number of components to represent words. Given a text collection of documents, our algorithm, similar to the skip-gram approach of word2vec, learns to predict the context of a word given the current instance of a word. However, in contrast to skip-gram, which uses a static number of dimensions for each word vector, we propose to dynamically increase the dimensionality as a stochastic function of the prediction error. Our experiments with standard test collections demonstrate that our word representation method is able to achieve comparable (and sometimes even better) effectiveness than skip-gram word2vec, using a significantly smaller number of parameters (achieving compression ratio of around 65%).