The text-to-graph (T2G) information transfer is a crucial step for building Knowledge Bases (KBs) from large text datasets. This is a fundamental goal of IBM Research: build intelligent systems to collect, organize, and process information efficiently.
The opposite step of graph-to-text (G2T) transfer is key in presenting the data encapsulated in a knowledge graph into a text form more easily readable by humans.
Figure 1 gives an example of this bi-directional transfer accomplished by our system.
The transfer from T2G yields a graph representation of the main facts of the input sentences. The subsequent G2T translation provides another paragraph, distinct from the original input but covering its facts accurately.
This bi-directional transfer of knowledge is a key principle of the Trusted AI Team at IBM Research where we develop tools to make AI more explainable, fair, robust, private, and transparent.
“Reinforced Generation” is the focus of this new work being presented at EMNL 2021. Our team explored the use of Reinforcement Learning to improve quality of T2G and G2T generation. Reinforced Generation or ReGen allows to improve quality significantly upon traditional techniques.
Our team is composed of Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk and Payel Das. ReGen code will be released in the companion GitHub repos.
Our approach is composed of several conceptually important steps:
Following this approach, we can build task-specialized, or hybrid models allowing generation in both directions, as presented in Figure 2.
In traditional approaches a model is trained by generating sequences that are then scored against ground truth examples, usually using a cross entropy (CE) loss to update the model parameters, as shown in Figure 2.
Our approach follows a variant of the REINFORCE policy gradient method (Williams, 1992) where the baseline is the reward of the model output under greedy max generation. This is known as Self-Critical Sequence Training (SCST) (Rennie, et al., 2017) where the model serves as its own critic, as seen in Figure 3.
A large Pretrained Language Model (such as T5) is used as a good starting point for our policy. This is to enable stable training using our policy gradient method. Rewards are modality dependent (graph, text) and must not only capture the information content but also the structure validity of the generated sequence — this is particularly important for directed graphs which require a very constrained structure.
We provide two working examples of T2G and G2T generation. The examples emphasize the benefits of using ReGen, our RL-based method, compared to using traditional CE-based methods of fine-tuning. In Figure 4, for both examples, the input sample is at the top. Below on the left, we provide the ground truth (in gray), while generated outputs for ReGen-CE and ReGen-RL are provided on the right in color (orange for CE, blue for RL). We can see that for these two examples ReGen-RL allows a more enriched, precise transfer.
We compared ReGen to the top systems of the WebNLG 2020 Challenge, a well-regarded public challenge for multilingual bi-directional generation between text and knowledge graph.
WebNLG is a difficult challenge. Its dataset is relatively small (13K train, 1.7K dev, 1.8K test) and includes unseen categories at test time. ReGen establishes new state-of-the-art results WebNLG 2020 Challenge dataset by large margins for both T2G and G2T direction, as demonstrated in Table 1 and Table 2.
On the much larger dataset TekGen (6.3M train, 5Kdev, 50K test), ReGen shows consistent gains for using Reinforced Generation, validating its use for large data operating points, as shown in Table 3 and Table 4.
We present results for both datasets, using well established metrics such as BLEU, METEOR, chrF++ for text generation. For graph generation, we use F1, Precision, Recall for nodes and edges with different levels of matching (exact, partial, strict, entity type) as defined by the WebNLG 2020 Challenge. Note, we only report results for exact match in Table 3 and Table 4, full results are in our paper (Dognin, et al., 2021).
Table 1: G2T best results for WebNLG 2020 Challenge dataset. The first four rows were the Challenge top performers. Results for IBM Research ReGen CE and RL systems show gains from using Reinforcement Learning. Our ReGen-RL is the best system overall, fine-tuning a t5-large model using METEOR reward.
WebNLG G2T Team/model | BLEU ↑ | BLEU↑ NLTK | METEOR↑ | chrF++↑ |
---|---|---|---|---|
Amazon AI (Shanghai) (Guo, et al., 2020) | 0.540 | 0.535 | 0.417 | 0.690 |
OSU Neural NLG (Li, et al., 2020) | 0.535 | 0.532 | 0.414 | 0.688 |
Facebook FBConvAI (Yang, et al., 2020) | 0.527 | 0.523 | 0.413 | 0.686 |
Google bt5 (Agarwal, et al., 2020) | 0.517 | 0.517 | 0.411 | 0.679 |
IBM Research ReGen-CE (Dognin, et al., 2021) | 0.553 | 0.549 | 0.418 | 0.694 |
IBM Research ReGen-RL (Dognin, et al., 2021) | 0.563 | 0.559 | 0.425 | 0.706 |
Table 2: T2G best results for WebNLG 2020 Challenge dataset. The top two rows were the Challenge top performers. ReGen models improve upon all metrics for all matching schemes, providing new state-of-the-art results.
WebNLG T2G Team/model | F1 ↑ | Precision↑ | Recall↑ |
---|---|---|---|
Amazon AI (Shanghai) (Guo, et al., 2020) | 0.689 | 0.689 | 0.690 |
Google bt5 (Agarwal, et al., 2020) | 0.682 | 0.670 | 0.701 |
IBM Research ReGen-CE (Dognin, et al., 2021) | 0.723 | 0.714 | 0.738 |
IBM Research ReGen-RL (Dognin, et al., 2021) | 0.720 | 0.712 | 0.734 |
Table 3: G2T TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL consistently improves upon this baseline on all metrics for T2G generation.
TekGen G2T Model | BLEU ↑ | BLEU↑ NLTK | METEOR↑ | chrF++ |
---|---|---|---|---|
Amazon AI (Shanghai) (Guo, et al., 2020) | 0.241 | 0.242 | 0.233 | 0.405 |
Google bt5 (Agarwal, et al., 2020) | 0.262 | 0.262 | 0.242 | 0.422 |
Table 4: T2G TekGen Results: IBM Research ReGen-CE establishes a baseline on the large TekGen dataset. ReGen-RL improves results on the test set compared to ReGen-CE on all metrics for text-to-graph generation.
TekGen T2G Model | F1 ↑ | Precision↑ | Recall↑ |
---|---|---|---|
IBM ReGen-CE (Dognin, et al., 2021) | 0.619 | 0.605 | 0.643 |
IBM ReGen-RL (Dognin, et al., 2021) | 0.623 | 0.610 | 0.647 |
Multiple exciting directions of research can now be explored given our current work:
ReGen code will be released in the companion GitHub repos.
IBM Researchers involved with this work are Pierre Dognin (tech lead), Inkit Padhi, Igor Melnyk, and Payel Das.