ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models
Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence generation problem regardless of the generative direction, which in turn allows the use of Reinforcement Learning for sequence training where the model itself is employed as its own critic leading to Self-Critical Sequence Training (SCST). We present an extensive investigation demonstrating that the use of RL via SCST benefits graph and text generation on WebNLG+ 2020 and TekGen datasets. Our system provides state-of-the-art results on WebNLG+ 2020 by significantly improving upon published results from the WebNLG 2020+ Challenge for both text-to-graph and graph-to-text generation tasks.
One foundational goal of IBM Research is to process real-word data in all its modalities: text, graph, tabular data, time series, etc. All are important representations of knowledge commonly observed in information-centric applications. Information should be easily translated from one modality to another seamlessly, without compromising on its factual content. For example, representing facts from a text into a knowledge graph is a fundamental principle of knowledge representation. In this work, we focus on accurate bi-directional transfer of information between graph and text modalities.
The text-to-graph (T2G) information transfer is a crucial step for building Knowledge Bases (KBs) from large text datasets. This is a fundamental goal of IBM Research: build intelligent systems to collect, organize, and process information efficiently.
The opposite step of graph-to-text (G2T) transfer is key in presenting the data encapsulated in a knowledge graph into a text form more easily readable by humans.
Figure 1 gives an example of this bi-directional transfer accomplished by our system.
The transfer from T2G yields a graph representation of the main facts of the input sentences. The subsequent G2T translation provides another paragraph, distinct from the original input but covering its facts accurately.
This bi-directional transfer of knowledge is a key principle of the Trusted AI Team at IBM Research where we develop tools to make AI more explainable, fair, robust, private, and transparent.
IBM Research introduces ReGen at EMNLP 2021
“Reinforced Generation” is the focus of this new work being presented at EMNL 2021. Our team explored the use of Reinforcement Learning to improve quality of T2G and G2T generation. Reinforced Generation or ReGen allows to improve quality significantly upon traditional techniques.
Our team is composed of Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk and Payel Das. ReGen code will be released in the companion GitHub repos.
The IBM Research approach
Our approach is composed of several conceptually important steps:
- Generation tasks (text-to-graph, graph-to-text) are reframed as sequence to sequence (seq2seq) translation tasks.
- “Graph linearization” turns graphs into sequence of edges our models can process easily.
- Pretrained Language Models (PLMs) built on large amount of data, such as T5, are fine-tuned on both generation tasks.
- Both generation tasks are cast into the Reinforcement Learning framework where a reward is attributed to the generated sequence given a ground truth.
Following this approach, we can build task-specialized, or hybrid models allowing generation in both directions, as presented in Figure 2.
In traditional approaches a model is trained by generating sequences that are then scored against ground truth examples, usually using a cross entropy (CE) loss to update the model parameters, as shown in Figure 2.
Our approach follows a variant of the REINFORCE policy gradient method (Williams, 1992) where the baseline is the reward of the model output under greedy max generation. This is known as Self-Critical Sequence Training (SCST) (Rennie, et al., 2017) where the model serves as its own critic, as seen in Figure 3.
A large Pretrained Language Model (such as T5) is used as a good starting point for our policy. This is to enable stable training using our policy gradient method. Rewards are modality dependent (graph, text) and must not only capture the information content but also the structure validity of the generated sequence — this is particularly important for directed graphs which require a very constrained structure.
Examples of generation
We provide two working examples of T2G and G2T generation. The examples emphasize the benefits of using ReGen, our RL-based method, compared to using traditional CE-based methods of fine-tuning. In Figure 4, for both examples, the input sample is at the top. Below on the left, we provide the ground truth (in gray), while generated outputs for ReGen-CE and ReGen-RL are provided on the right in color (orange for CE, blue for RL). We can see that for these two examples ReGen-RL allows a more enriched, precise transfer.
IBM Research’s lead
We compared ReGen to the top systems of the WebNLG 2020 Challenge, a well-regarded public challenge for multilingual bi-directional generation between text and knowledge graph.
WebNLG is a difficult challenge. Its dataset is relatively small (13K train, 1.7K dev, 1.8K test) and includes unseen categories at test time. ReGen establishes new state-of-the-art results WebNLG 2020 Challenge dataset by large margins for both T2G and G2T direction, as demonstrated in Table 1 and Table 2.
On the much larger dataset TekGen (6.3M train, 5Kdev, 50K test), ReGen shows consistent gains for using Reinforced Generation, validating its use for large data operating points, as shown in Table 3 and Table 4.
We present results for both datasets, using well established metrics such as BLEU, METEOR, chrF++ for text generation. For graph generation, we use F1, Precision, Recall for nodes and edges with different levels of matching (exact, partial, strict, entity type) as defined by the WebNLG 2020 Challenge. Note, we only report results for exact match in Table 3 and Table 4, full results are in our paper (Dognin, et al., 2021).
Table 1: G2T best results for WebNLG 2020 Challenge dataset. The first four rows were the Challenge top performers. Results for IBM Research ReGen CE and RL systems show gains from using Reinforcement Learning. Our ReGen-RL is the best system overall, fine-tuning a t5-large model using METEOR reward.
|WebNLG G2T Team/model||BLEU ↑||BLEU↑ NLTK||METEOR↑||chrF++↑|
|Amazon AI (Shanghai) (Guo, et al., 2020)||0.540||0.535||0.417||0.690|
|OSU Neural NLG (Li, et al., 2020)||0.535||0.532||0.414||0.688|
|Facebook FBConvAI (Yang, et al., 2020)||0.527||0.523||0.413||0.686|
|Google bt5 (Agarwal, et al., 2020)||0.517||0.517||0.411||0.679|
|IBM Research ReGen-CE (Dognin, et al., 2021)||0.553||0.549||0.418||0.694|
|IBM Research ReGen-RL (Dognin, et al., 2021)||0.563||0.559||0.425||0.706|
Table 2: T2G best results for WebNLG 2020 Challenge dataset. The top two rows were the Challenge top performers. ReGen models improve upon all metrics for all matching schemes, providing new state-of-the-art results.
|WebNLG T2G Team/model||F1 ↑||Precision↑||Recall↑|
|Amazon AI (Shanghai) (Guo, et al., 2020)||0.689||0.689||0.690|
|Google bt5 (Agarwal, et al., 2020)||0.682||0.670||0.701|
|IBM Research ReGen-CE (Dognin, et al., 2021)||0.723||0.714||0.738|
|IBM Research ReGen-RL (Dognin, et al., 2021)||0.720||0.712||0.734|
Table 3: G2T TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL consistently improves upon this baseline on all metrics for T2G generation.
|TekGen G2T Model||BLEU ↑||BLEU↑ NLTK||METEOR↑||chrF++|
|Amazon AI (Shanghai) (Guo, et al., 2020)||0.241||0.242||0.233||0.405|
|Google bt5 (Agarwal, et al., 2020)||0.262||0.262||0.242||0.422|
Table 4: T2G TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL improves results on the test set compared to ReGen-CE on all metrics for text-to-graph generation.
|TekGen T2G Model||F1 ↑||Precision↑||Recall↑|
|IBM ReGen-CE (Dognin, et al., 2021)||0.619||0.605||0.643|
|IBM ReGen-RL (Dognin, et al., 2021)||0.623||0.610||0.647|
Multiple exciting directions of research can now be explored given our current work:
- Very large graph construction from large datasets of text is the ultimate goal for this research and ReGen is one step forward in that direction.
- Reward definition can allow for constrained generation in terms of structure and content, which can be beneficial for applications where constrained generated output is required.
- Fairness and Trust is another angle of investigation in this paradigm for both generation directions as starting point PLMs may display bias from its own training data.
ReGen code will be released in the companion GitHub repos.
IBM Researchers involved with this work are Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk, and Payel Das.
- Agarwal, O., Kale, M., Ge, H., Shakeri, S. & Al-Rfou, R. Machine Translation Aided Bilingual Data-to-Text Generation and Semantic Parsing. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 125–130 (Association for Computational Linguistics, 2020).
- Guo, Q. et al. P2: A Plan-and-Pretrain Approach for Knowledge Graph-to-Text Generation. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 100–106 (Association for Computational Linguistics, 2020).
- Li, X., Maskharashvili, A., Jory Stevens-Guille, S. & White, M. Leveraging Large Pretrained Models for WebNLG 2020. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 117–124 (Association for Computational Linguistics, 2020).
- Rennie, S. J., Marcheret, E., Mroueh, Y., Ross, J. & Goel, V. Self-Critical Sequence Training for Image Captioning. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1179–1195 (IEEE, 2017). doi:10.1109/CVPR.2017.131.
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8, 229–256 (1992).
- Yang, Z. et al. Improving Text-to-Text Pre-trained Models for the Graph-to-Text Task. in Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+) 107–116 (Association for Computational Linguistics, 2020).