About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
WWW 2018
Conference paper
Contextual Word Embedding: A Case Study in Clustering Tweets about Emergency Situations
Abstract
Effective clustering of short documents, such as tweets, is difficult because of the lack of sufficient semantic context. Word embedding is a technique that is effective in addressing this lack of semantic context. However, the process of word vector embedding, in turn, relies on the availability of sufficient contexts to learn the word associations. To get around this problem, we propose a novel word vector training approach that leverages topically similar tweets to better learn the word associations. We test our proposed word embedding approach by clustering a collection of tweets on disasters. We observe that the proposed method improves clustering effectiveness by up to 14%.