Publication
FUSION 2018
Conference paper

Doc2Img: A New Approach to Vectorization of Documents

View publication

Abstract

Vector space representations of text have increased in popularity and are used in various text classification problems. We present Doc2Img, a new approach to create document vectors that improves upon existing approaches such as Word2Vec and Doc2Vec in capturing similarities between words within a document and the differences across documents. We apply this new vector space representation to the problem of deriving the sensor requirements of apps (for smartphones and IoT devices) by learning a classification model using document vectors. We show that this learned model outperforms existing vector space representations (Word2Vec and Doc2Vec) by more than 10%. Further, this model can predict with an average accuracy of 75% and greater than 85% on the top-20 sensor requirements for 300 different applications.

Date

05 Sep 2018

Publication

FUSION 2018

Share