Doc2Img: A New Approach to Vectorization of Documents

Shreeranjani Srirangamsridharan; Mudhakar Srivatsa; Raghu Ganti; Chris Simpkin

doi:10.23919/ICIF.2018.8455685

FUSION 2018

Conference paper

05 Sep 2018

Doc2Img: A New Approach to Vectorization of Documents

View publication

Abstract

Vector space representations of text have increased in popularity and are used in various text classification problems. We present Doc2Img, a new approach to create document vectors that improves upon existing approaches such as Word2Vec and Doc2Vec in capturing similarities between words within a document and the differences across documents. We apply this new vector space representation to the problem of deriving the sensor requirements of apps (for smartphones and IoT devices) by learning a classification model using document vectors. We show that this learned model outperforms existing vector space representations (Word2Vec and Doc2Vec) by more than 10%. Further, this model can predict with an average accuracy of 75% and greater than 85% on the top-20 sensor requirements for 300 different applications.

Conference paper