TalkSumm: A dataset and scalable annotation method for scientific paper summarization based on conference talks

Guy Lev; Michal Shmueli-Scheuer; Jonathan Herzig; Achiya Jerbi; David Konopnicki

ACL 2019

Conference paper

28 Jul 2019

TalkSumm: A dataset and scalable annotation method for scientific paper summarization based on conference talks

Abstract

Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

Workshop paper