Publication
EMNLP 2021
Conference paper

ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

Download paper

Abstract

Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks.

Authors’ notes

One foundational goal of IBM Research is to process real-word data in all its modalities: text, graph, tabular data, time series, etc. All are important representations of knowledge commonly observed in information-centric applications. Information should be easily translated from one modality to another seamlessly, without compromising on its factual content. For example, representing facts from a text into a knowledge graph is a fundamental principle of knowledge representation. In this work, we focus on accurate bi-directional transfer of information between graph and text modalities.

Text to Graph: Automatic Generation of Knowledge Bases (KBs)
Text-to-Graph: Automatic Generation of Knowledge Bases (KBs)
Graph to Text: Text generation from KBs
Graph-to-Text: Text generation from KBs

The text-to-graph (T2G) information transfer is a crucial step for building Knowledge Bases (KBs) from large text datasets. This is a fundamental goal of IBM Research: build intelligent systems to collect, organize, and process information efficiently.

The opposite step of graph-to-text (G2T) transfer is key in presenting the data encapsulated in a knowledge graph into a text form more easily readable by humans.

Figure 1 gives an example of this bi-directional transfer accomplished by our system.

Figure 1:  An example of knowledge transfer where the first two sentences of the abstract of our paper (Dognin, et al., 2021) on top are processed through our ReGen models. First, a knowledge graph is constructed, then it is used as input to generate a paragraph of text using our system (on the right). Note that the generated paragraph captures the original sentences content accurately.
Figure 1: An example of knowledge transfer where the first two sentences of the abstract of our paper (Dognin, et al., 2021) on top are processed through our ReGen models. First, a knowledge graph is constructed, then it is used as input to generate a paragraph of text using our system (on the right). Note that the generated paragraph captures the original sentences content accurately.

The transfer from T2G yields a graph representation of the main facts of the input sentences. The subsequent G2T translation provides another paragraph, distinct from the original input but covering its facts accurately.

This bi-directional transfer of knowledge is a key principle of the Trusted AI Team at IBM Research where we develop tools to make AI more explainable, fair, robust, private, and transparent.

IBM Research introduces ReGen at EMNLP 2021

“Reinforced Generation” is the focus of this new work being presented at EMNL 2021. Our team explored the use of Reinforcement Learning to improve quality of T2G and G2T generation. Reinforced Generation or ReGen allows to improve quality significantly upon traditional techniques.

Our team is composed of Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk and Payel Das. ReGen code will be released in the companion GitHub repos.

The IBM Research approach

Our approach is composed of several conceptually important steps:

  • Generation tasks (text-to-graph, graph-to-text) are reframed as sequence to sequence (seq2seq) translation tasks.
  • “Graph linearization” turns graphs into sequence of edges our models can process easily.
  • Pretrained Language Models (PLMs) built on large amount of data, such as T5, are fine-tuned on both generation tasks.
  • Both generation tasks are cast into the Reinforcement Learning framework where a reward is attributed to the generated sequence given a ground truth.

Following this approach, we can build task-specialized, or hybrid models allowing generation in both directions, as presented in Figure 2.

Figure 2: Specialized and hybrid models rely on the same losses for fine-tuning. Specialized models are dedicated to a given generation direction while hybrid models can handle both directions (graph-to-text, text-to-graph).
Figure 2: Specialized and hybrid models rely on the same losses for fine-tuning. Specialized models are dedicated to a given generation direction while hybrid models can handle both directions (graph-to-text, text-to-graph).

In traditional approaches a model is trained by generating sequences that are then scored against ground truth examples, usually using a cross entropy (CE) loss to update the model parameters, as shown in Figure 2.

Our approach follows a variant of the REINFORCE policy gradient method (Williams, 1992) where the baseline is the reward of the model output under greedy max generation. This is known as Self-Critical Sequence Training (SCST) (Rennie, et al., 2017) where the model serves as its own critic, as seen in Figure 3.

Figure 3: ReGen models are trained using Self Critical Sequence Training which is a policy gradient method where the baseline is the reward of the output of greedy-max generation p*, the model acting as its own critic. ps is a sampling of our policy that allows for exploration during training. The policy p is initialized to a large T5 PLM to ensure stability.
Figure 3: ReGen models are trained using Self Critical Sequence Training which is a policy gradient method where the baseline is the reward of the output of greedy-max generation p*, the model acting as its own critic. ps is a sampling of our policy that allows for exploration during training. The policy p is initialized to a large T5 PLM to ensure stability.

A large Pretrained Language Model (such as T5) is used as a good starting point for our policy. This is to enable stable training using our policy gradient method. Rewards are modality dependent (graph, text) and must not only capture the information content but also the structure validity of the generated sequence — this is particularly important for directed graphs which require a very constrained structure.

Examples of generation

We provide two working examples of T2G and G2T generation. The examples emphasize the benefits of using ReGen, our RL-based method, compared to using traditional CE-based methods of fine-tuning. In Figure 4, for both examples, the input sample is at the top. Below on the left, we provide the ground truth (in gray), while generated outputs for ReGen-CE and ReGen-RL are provided on the right in color (orange for CE, blue for RL). We can see that for these two examples ReGen-RL allows a more enriched, precise transfer.

Figure 4: Examples of generation for T2G and G2T with the difference between traditional CE and RL (ReGen) model outputs. Each example has an input sample at the top (text for T2G, graph for G2T), below to the left in gray is the ground truth of the target domain (graph for T2G, text for G2T), and below to the right is the generated output in color (orange for CE, blue for RL) to emphasize the difference between generation from CE and RL ReGen.
Figure 4: Examples of generation for T2G and G2T with the difference between traditional CE and RL (ReGen) model outputs. Each example has an input sample at the top (text for T2G, graph for G2T), below to the left in gray is the ground truth of the target domain (graph for T2G, text for G2T), and below to the right is the generated output in color (orange for CE, blue for RL) to emphasize the difference between generation from CE and RL ReGen.

IBM Research’s lead

We compared ReGen to the top systems of the WebNLG 2020 Challenge, a well-regarded public challenge for multilingual bi-directional generation between text and knowledge graph.

WebNLG is a difficult challenge. Its dataset is relatively small (13K train, 1.7K dev, 1.8K test) and includes unseen categories at test time. ReGen establishes new state-of-the-art results WebNLG 2020 Challenge dataset by large margins for both T2G and G2T direction, as demonstrated in Table 1 and Table 2.

On the much larger dataset TekGen (6.3M train, 5Kdev, 50K test), ReGen shows consistent gains for using Reinforced Generation, validating its use for large data operating points, as shown in Table 3 and Table 4.

We present results for both datasets, using well established metrics such as BLEU, METEOR, chrF++ for text generation. For graph generation, we use F1, Precision, Recall for nodes and edges with different levels of matching (exact, partial, strict, entity type) as defined by the WebNLG 2020 Challenge. Note, we only report results for exact match in Table 3 and Table 4, full results are in our paper (Dognin, et al., 2021).

Table 1: G2T best results for WebNLG 2020 Challenge dataset. The first four rows were the Challenge top performers. Results for IBM Research ReGen CE and RL systems show gains from using Reinforcement Learning. Our ReGen-RL is the best system overall, fine-tuning a t5-large model using METEOR reward.

WebNLG G2T Team/model BLEU ↑ BLEU↑ NLTK METEOR↑ chrF++↑
Amazon AI (Shanghai) (Guo, et al., 2020) 0.540 0.535 0.417 0.690
OSU Neural NLG (Li, et al., 2020) 0.535 0.532 0.414 0.688
Facebook FBConvAI (Yang, et al., 2020) 0.527 0.523 0.413 0.686
Google bt5 (Agarwal, et al., 2020) 0.517 0.517 0.411 0.679
IBM Research ReGen-CE (Dognin, et al., 2021) 0.553 0.549 0.418 0.694
IBM Research ReGen-RL (Dognin, et al., 2021) 0.563 0.559 0.425 0.706

Table 2: T2G best results for WebNLG 2020 Challenge dataset. The top two rows were the Challenge top performers. ReGen models improve upon all metrics for all matching schemes, providing new state-of-the-art results.

WebNLG T2G Team/model F1 ↑ Precision↑ Recall↑
Amazon AI (Shanghai) (Guo, et al., 2020) 0.689 0.689 0.690
Google bt5 (Agarwal, et al., 2020) 0.682 0.670 0.701
IBM Research ReGen-CE (Dognin, et al., 2021) 0.723 0.714 0.738
IBM Research ReGen-RL (Dognin, et al., 2021) 0.720 0.712 0.734

Table 3: G2T TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL consistently improves upon this baseline on all metrics for T2G generation.

TekGen G2T Model BLEU ↑ BLEU↑ NLTK METEOR↑ chrF++
Amazon AI (Shanghai) (Guo, et al., 2020) 0.241 0.242 0.233 0.405
Google bt5 (Agarwal, et al., 2020) 0.262 0.262 0.242 0.422

Table 4: T2G TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL improves results on the test set compared to ReGen-CE on all metrics for text-to-graph generation.

TekGen T2G Model F1 ↑ Precision↑ Recall↑
IBM ReGen-CE (Dognin, et al., 2021) 0.619 0.605 0.643
IBM ReGen-RL (Dognin, et al., 2021) 0.623 0.610 0.647

Future work

Multiple exciting directions of research can now be explored given our current work:

  1. Very large graph construction from large datasets of text is the ultimate goal for this research and ReGen is one step forward in that direction.
  2. Reward definition can allow for constrained generation in terms of structure and content, which can be beneficial for applications where constrained generated output is required.
  3. Fairness and Trust is another angle of investigation in this paradigm for both generation directions as starting point PLMs may display bias from its own training data.

Details

ReGen code will be released in the companion GitHub repos.

IBM Researchers involved with this work are Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk, and Payel Das.

Bibliography