About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
FUSION 2018
Conference paper
Doc2Img: A New Approach to Vectorization of Documents
Abstract
Vector space representations of text have increased in popularity and are used in various text classification problems. We present Doc2Img, a new approach to create document vectors that improves upon existing approaches such as Word2Vec and Doc2Vec in capturing similarities between words within a document and the differences across documents. We apply this new vector space representation to the problem of deriving the sensor requirements of apps (for smartphones and IoT devices) by learning a classification model using document vectors. We show that this learned model outperforms existing vector space representations (Word2Vec and Doc2Vec) by more than 10%. Further, this model can predict with an average accuracy of 75% and greater than 85% on the top-20 sensor requirements for 300 different applications.