Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

Joydeep Mondal; Sarthak Ahuja; Kushal Mukherjee; Sudhanshu Singh; Gyana Parija

ESWC 2018

Conference paper

03 Jun 2018

Benchmarking of a Novel POS Tagging Based Semantic Similarity Approach for Job Description Similarity Computation

View publication

Abstract

Most solutions providing hiring analytics involve mapping provided job descriptions to a standard job framework, thereby requiring computation of a document similarity score between two job descriptions. Finding semantic similarity between a pair of documents is a problem that is yet to be solved satisfactorily over all possible domains/contexts. Most document similarity calculation exercises require a large corpus of data for training the underlying models. In this paper we compare three methods of document similarity for job descriptions - topic modeling (LDA), doc2vec, and a novel part-of-speech tagging based document similarity (POSDC) calculation method. LDA and doc2vec require a large corpus of data to train, while POSDC exploits a domain specific property of descriptive documents (such as job descriptions) that enables us to compare two documents in isolation. POSDC method is based on an action-object-attribute representation of documents, that allows meaningful comparisons. We use stanford Core NLP and NLTK Wordnet to do a multilevel semantic match between the actions and corresponding objects. We use sklearn for topic modeling and gensim for doc2vec. We compare the results from these three methods based on IBM Kenexa Talent frameworks job taxonomy.

Conference paper