ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

Pierre Dognin; Inkit Padhi; Igor Melnyk; Payel Das

EMNLP 2021

Conference paper

07 Nov 2021

ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

Download paper

Abstract

Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks.

Authors’ notes

One foundational goal of IBM Research is to process real-word data in all its modalities: text, graph, tabular data, time series, etc. All are important representations of knowledge commonly observed in information-centric applications. Information should be easily translated from one modality to another seamlessly, without compromising on its factual content. For example, representing facts from a text into a knowledge graph is a fundamental principle of knowledge representation. In this work, we focus on accurate bi-directional transfer of information between graph and text modalities.

Text to Graph: Automatic Generation of Knowledge Bases (KBs) — Text-to-Graph: Automatic Generation of Knowledge Bases (KBs)

Graph to Text: Text generation from KBs — Graph-to-Text: Text generation from KBs

The text-to-graph (T2G) information transfer is a crucial step for building Knowledge Bases (KBs) from large text datasets. This is a fundamental goal of IBM Research: build intelligent systems to collect, organize, and process information efficiently.

The opposite step of graph-to-text (G2T) transfer is key in presenting the data encapsulated in a knowledge graph into a text form more easily readable by humans.

Figure 1 gives an example of this bi-directional transfer accomplished by our system.

Figure 1: An example of knowledge transfer where the first two sentences of the abstract of our paper (Dognin, et al., 2021) on top are processed through our ReGen models. First, a knowledge graph is constructed, then it is used as input to generate a paragraph of text using our system (on the right). Note that the generated paragraph captures the original sentences content accurately.

The transfer from T2G yields a graph representation of the main facts of the input sentences. The subsequent G2T translation provides another paragraph, distinct from the original input but covering its facts accurately.

This bi-directional transfer of knowledge is a key principle of the Trusted AI Team at IBM Research where we develop tools to make AI more explainable, fair, robust, private, and transparent.

IBM Research introduces ReGen at EMNLP 2021

“Reinforced Generation” is the focus of this new work being presented at EMNL 2021. Our team explored the use of Reinforcement Learning to improve quality of T2G and G2T generation. Reinforced Generation or ReGen allows to improve quality significantly upon traditional techniques.

Our team is composed of Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk and Payel Das. ReGen code will be released in the companion GitHub repos.

The IBM Research approach

Our approach is composed of several conceptually important steps:

Generation tasks (text-to-graph, graph-to-text) are reframed as sequence to sequence (seq2seq) translation tasks.
“Graph linearization” turns graphs into sequence of edges our models can process easily.
Pretrained Language Models (PLMs) built on large amount of data, such as T5, are fine-tuned on both generation tasks.
Both generation tasks are cast into the Reinforcement Learning framework where a reward is attributed to the generated sequence given a ground truth.

Following this approach, we can build task-specialized, or hybrid models allowing generation in both directions, as presented in Figure 2.

Figure 2: Specialized and hybrid models rely on the same losses for fine-tuning. Specialized models are dedicated to a given generation direction while hybrid models can handle both directions (graph-to-text, text-to-graph).

In traditional approaches a model is trained by generating sequences that are then scored against ground truth examples, usually using a cross entropy (CE) loss to update the model parameters, as shown in Figure 2.

Our approach follows a variant of the REINFORCE policy gradient method (Williams, 1992) where the baseline is the reward of the model output under greedy max generation. This is known as Self-Critical Sequence Training (SCST) (Rennie, et al., 2017) where the model serves as its own critic, as seen in Figure 3.

Figure 3: ReGen models are trained using Self Critical Sequence Training which is a policy gradient method where the baseline is the reward of the output of greedy-max generation p*, the model acting as its own critic. ps is a sampling of our policy that allows for exploration during training. The policy p is initialized to a large T5 PLM to ensure stability.

A large Pretrained Language Model (such as T5) is used as a good starting point for our policy. This is to enable stable training using our policy gradient method. Rewards are modality dependent (graph, text) and must not only capture the information content but also the structure validity of the generated sequence — this is particularly important for directed graphs which require a very constrained structure.

Examples of generation

We provide two working examples of T2G and G2T generation. The examples emphasize the benefits of using ReGen, our RL-based method, compared to using traditional CE-based methods of fine-tuning. In Figure 4, for both examples, the input sample is at the top. Below on the left, we provide the ground truth (in gray), while generated outputs for ReGen-CE and ReGen-RL are provided on the right in color (orange for CE, blue for RL). We can see that for these two examples ReGen-RL allows a more enriched, precise transfer.

IBM Research’s lead

We compared ReGen to the top systems of the WebNLG 2020 Challenge, a well-regarded public challenge for multilingual bi-directional generation between text and knowledge graph.

WebNLG is a difficult challenge. Its dataset is relatively small (13K train, 1.7K dev, 1.8K test) and includes unseen categories at test time. ReGen establishes new state-of-the-art results WebNLG 2020 Challenge dataset by large margins for both T2G and G2T direction, as demonstrated in Table 1 and Table 2.

On the much larger dataset TekGen (6.3M train, 5Kdev, 50K test), ReGen shows consistent gains for using Reinforced Generation, validating its use for large data operating points, as shown in Table 3 and Table 4.

We present results for both datasets, using well established metrics such as BLEU, METEOR, chrF++ for text generation. For graph generation, we use F1, Precision, Recall for nodes and edges with different levels of matching (exact, partial, strict, entity type) as defined by the WebNLG 2020 Challenge. Note, we only report results for exact match in Table 3 and Table 4, full results are in our paper (Dognin, et al., 2021).

Table 1: G2T best results for WebNLG 2020 Challenge dataset. The first four rows were the Challenge top performers. Results for IBM Research ReGen CE and RL systems show gains from using Reinforcement Learning. Our ReGen-RL is the best system overall, fine-tuning a t5-large model using METEOR reward.

WebNLG G2T Team/model	BLEU ↑	BLEU↑ NLTK	METEOR↑	chrF++↑
Amazon AI (Shanghai) (Guo, et al., 2020)	0.540	0.535	0.417	0.690
OSU Neural NLG (Li, et al., 2020)	0.535	0.532	0.414	0.688
Facebook FBConvAI (Yang, et al., 2020)	0.527	0.523	0.413	0.686
Google bt5 (Agarwal, et al., 2020)	0.517	0.517	0.411	0.679
IBM Research ReGen-CE (Dognin, et al., 2021)	0.553	0.549	0.418	0.694
IBM Research ReGen-RL (Dognin, et al., 2021)	0.563	0.559	0.425	0.706

Table 2: T2G best results for WebNLG 2020 Challenge dataset. The top two rows were the Challenge top performers. ReGen models improve upon all metrics for all matching schemes, providing new state-of-the-art results.

WebNLG T2G Team/model	F1 ↑	Precision↑	Recall↑
Amazon AI (Shanghai) (Guo, et al., 2020)	0.689	0.689	0.690
Google bt5 (Agarwal, et al., 2020)	0.682	0.670	0.701
IBM Research ReGen-CE (Dognin, et al., 2021)	0.723	0.714	0.738
IBM Research ReGen-RL (Dognin, et al., 2021)	0.720	0.712	0.734

Table 3: G2T TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL consistently improves upon this baseline on all metrics for T2G generation.

TekGen G2T Model	BLEU ↑	BLEU↑ NLTK	METEOR↑	chrF++
Amazon AI (Shanghai) (Guo, et al., 2020)	0.241	0.242	0.233	0.405
Google bt5 (Agarwal, et al., 2020)	0.262	0.262	0.242	0.422

Table 4: T2G TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL improves results on the test set compared to ReGen-CE on all metrics for text-to-graph generation.

TekGen T2G Model	F1 ↑	Precision↑	Recall↑
IBM ReGen-CE (Dognin, et al., 2021)	0.619	0.605	0.643
IBM ReGen-RL (Dognin, et al., 2021)	0.623	0.610	0.647

Future work

Multiple exciting directions of research can now be explored given our current work:

Very large graph construction from large datasets of text is the ultimate goal for this research and ReGen is one step forward in that direction.
Reward definition can allow for constrained generation in terms of structure and content, which can be beneficial for applications where constrained generated output is required.
Fairness and Trust is another angle of investigation in this paradigm for both generation directions as starting point PLMs may display bias from its own training data.

Details

ReGen code will be released in the companion GitHub repos.

IBM Researchers involved with this work are Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk, and Payel Das.

Bibliography

Agarwal, O., Kale, M., Ge, H., Shakeri, S. & Al-Rfou, R. Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 125–130 (Association for Computational Linguistics, 2020).
Guo, Q. et al. P2: A Plan-and-Pretrain Approach for Knowledge Graph-to-Text Generation. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 100–106 (Association for Computational Linguistics, 2020).
Li, X., Maskharashvili, A., Jory Stevens-Guille, S. & White, M. Leveraging Large Pretrained Models for WebNLG 2020. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 117–124 (Association for Computational Linguistics, 2020).
Rennie, S. J., Marcheret, E., Mroueh, Y., Ross, J. & Goel, V. Self-Critical Sequence Training for Image Captioning. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1179–1195 (IEEE, 2017). doi:10.1109/CVPR.2017.131.
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8, 229–256 (1992).
Yang, Z. et al. Improving Text-to-Text Pre-trained Models for the Graph-to-Text Task. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 107–116 (Association for Computational Linguistics, 2020).

Conference paper

LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services

Gosia Lazuka, Andreea Simona Anghel, et al.

SC 2024

Workshop paper

Control Flow Operators in PyTorch

Yidi Wu, Thomas Bohnstingl, et al.

ICML 2025

Paper

Neural Unification for Logic Reasoning over Natural Language

Gabriele Picco, Lam Thanh Hoang, et al.

EMNLP 2021

Conference paper

Identifying Homogeneous and Interpretable Groups for Conformal Prediction

Natalia Martinez Gil, Dhaval Patel, et al.

UAI 2024

View all publications

Abstract

Authors’ notes

IBM Research introduces ReGen at EMNLP 2021

The IBM Research approach

Examples of generation

IBM Research’s lead

Future work

Details

Bibliography

Related

LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services

Control Flow Operators in PyTorch

Neural Unification for Logic Reasoning over Natural Language

Identifying Homogeneous and Interpretable Groups for Conformal Prediction