Predicting the impact of scientific concepts using full-text features

Kathleen McKeown; Hal Daume; Snigdha Chaturvedi; John Paparrizos; Kapil Thadani; Pablo Barrio; Or Biran; Suvarna Bothe; Michael Collins; Kenneth R. Fleischmann; Luis Gravano; Rahul Jha; Ben King; Kevin McInerney; Taesun Moon; Arvind Neelakantan; Diarmuid O&#039;Seaghdha; Dragomir R. Radev; Clay Templeton; Simone Teufel

doi:10.1002/asi.23612

JASIST

Paper

11 Jan 2016

Predicting the impact of scientific concepts using full-text features

View publication

Abstract

New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate predictions result from combining the metadata and full-text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.

Conference paper