DiSCern: A diversified citation recommendation system for scientific queries
Performing literature survey for scholarly activities has become a challenging and time consuming task due to the rapid growth in the number of scientific articles. Thus, automatic recommendation of high quality citations for a given scientific query topic is immensely valuable. The state-of-the-art on the problem of citation recommendation suffers with the following three limitations. First, most of the existing approaches for citation recommendation require input in the form of either the full article or a seed set of citations, or both. Nevertheless, obtaining the recommendation for citations given a set of keywords is extremely useful for many scientific purposes. Second, the existing techniques for citation recommendation aim at suggesting prestigious and well-cited articles. However, we often need recommendation of diversified citations of the given query topic for many scientific purposes; for instance, it helps authors to write survey papers on a topic and it helps scholars to get a broad view of key problems on a topic. Third, one of the problems in the keyword based citation recommendation is that the search results typically would not include the semantically correlated articles if these articles do not use exactly the same keywords. To the best of our knowledge, there is no known citation recommendation system in the literature that addresses the above three limitations simultaneously. In this paper, we propose a novel citation recommendation system called DiSCern to precisely address the above research gap. DiSCern finds relevant and diversified citations in response to a search query, in terms of keyword(s) to describe the query topic, while using only the citation graph and the keywords associated with the articles, and no latent information. We use a novel keyword expansion step, inspired by community finding in social network analysis, in DiSCern to ensure that the semantically correlated articles are also included in the results. Our proposed approach primarily builds on the Vertex Reinforced Random Walk (VRRW) to balance prestige and diversity in the recommended citations. We demonstrate the efficacy of DiSCern empirically on two datasets: a large publication dataset of more than 1.7 million articles in computer science domain and a dataset of more than 29,000 articles in theoretical high-energy physics domain. The experimental results show that our proposed approach is quite efficient and it outperforms the state-of-the-art algorithms in terms of both relevance and diversity.