X-FACTOR: A Cross-metric Evaluation of Factual Correctness in Abstractive Summarization
Abstractive summarization models often produce factually inconsistent statements that are not supported by the original article. Recently, a number of fact-consistency evaluation techniques have been proposed to help address this issue; however, a detailed analysis of how these metrics agree with one another has yet to be conducted. In this paper, we present X-FACTOR, a cross-evaluation for three high-performing fact-aware abstractive summarization methods. First, we show that summarization models are often fine-tuned on datasets that contain factually inconsistent summaries and propose a fact-aware filtering mechanism that improves the quality of training data and, consequently, the factuality of these models. Second, we propose a corrector module that can be used to directly improve the factual consistency of generated summaries. Third, we present a re-ranking technique that samples summary instances from the output distribution of a summarization model and re-ranks the sampled instances based on their factuality. Finally, we provide a detailed cross-metric agreement analysis that shows how tuning a model to output summaries based on a particular factuality metric influences factuality as determined by the other metrics. It is our goal for this work to facilitate research that improves the factuality and faithfulness of abstractive summarization models.