27 May 2021

Research

11 minute read

Knowledge Graph construction gets big boost from AI

An IBM Research team reaches top place at Facebook's KILT leaderboard for Knowledge Graph development — paving the way to take the technology to the real world.

Data is everywhere. And artificial intelligence (AI) has become invaluable in storing and organizing large amounts of it — using “knowledge graphs.”

A knowledge graph is a database that allows AI systems to deal with complex, interrelated data. It stores information as a network of data points connected by different types of relations. Knowledge graphs power internet search, recommender systems and chatbots. Take an e-commerce site — chances are, it uses knowledge graphs to describe products, schedule delivery and help customers through virtual assistants.

But small organizations can't always afford building and maintaining knowledge graphs. Defining a set of entity and relation types and populating it with ‘knowledge’ extracted from domain-specific data sources is a challenging task. It requires considerable resources and the development of a large amount of hand-labeled data to train machine learning systems.

We've decided to help.

Our Knowledge Induction team created a way to significantly improve “slot filling” — an essential task in building AI-driven knowledge graphs. Our approach, called Knowledge Graph Induction (KGI), is based on a language generation model dubbed Retrieval Augmented Generation (RAG).

In our arXiv preprint,¹ we detail how we’ve used KGI to reach the top position at the industry-recognized KILT leaderboard in two zero-shot slot filling tasks, T-Rex and Zero Shot RE. We’ve achieved about 84 percent and 73 percent in accuracy, respectively, with a net gain of more than 20 percent compared to previous state-of-the-art methods. This high score indicates that the technology is mature enough to be used in real settings — our next step.

From text to databases

In the past decade, deep learning and encoder-decoder transformer architectures have radically changed the AI landscape, greatly improving knowledge-induction technologies. Neural networks can now be trained using web-scale data in a fully unsupervised manner to learn language models, storing a vast amount of background knowledge.

They rely on text — text that’s represented by dense vectors, fine-tuned on a specific task, such as question answering or text categorization. These tasks enable a form of transfer learning that greatly reduces the need for hand-labelled training data on a specific application.

Most data in enterprises is typically in the form of text documents. So building a knowledge graph based on this data requires custom-made Information Extraction (IE) analytics for entity recognition and relation extraction. This process is also known as Knowledge Base Population (KBP) — and one of its tasks is slot filling.

Example of an auto insurance claim written in natural language. The slot filling task is to identify relevant information needed by the insurance, such as the model of the vehicle, the parts of the car impacted, etc.

Slot filling involves completing entity-specific templates with information extracted from text. For example, in the picture above, given an auto insurance claim written in natural language, the slot filling task is to identify relevant information needed by the insurance, such as the model of the vehicle, the parts of the car impacted, and so on.

There are two different types of slot filling tasks. One is document-centric, where the entity is represented by an entire document such as an insurance claim in the example above. The other is entity-centric, where the information about the entity of interest (say, a person or a company) is spread across multiple documents in a large corpus, as illustrated by the picture below.

Example of an entity-centric slot-filling task, where the information about the entity of interest (say, a person or a company) is spread across multiple documents in a large corpus.

The task of slot filling is typically done by humans, for example financial analysts collecting information about companies from news, by manually inspecting the information from different sources and filling spreadsheets or databases. This is a tedious and time-consuming task, and often absorbs most of the (very expensive) time of analysts — and a task that could be done by AI.

That’s why, in the past two decades, researchers have been developing such AI-based solutions. The AI explores how often the input entity occurs in the corpus, and then gathers information about its slot fillers from the context. When prompted with an entity and a set of relations, a slot filling system fills out a template and provides the passages which explain the reasons why slots have been filled.

Typically, to build accurate and robust IE analytics for Knowledge Base Population — such as Watson Knowledge Studio — scientists use either strictly supervised approaches that need a large quantity of hand-labelled data or a rule-based system that requires ad-hoc dictionaries and syntactic rules.

But both cases require a considerable effort to adapt machine learning models to a new domain. To train supervised IE analytics, companies have to label a large collection of documents. They have to identify mentions of entities in text, such as “manufacturer” and “model” in the car insurance example above, and relations among them, such as “has-property.”

Also, to achieve good accuracy, the system often needs a vast amount of entity names, such as a list of all possible car models and manufacturers from a pre-existing database. Collecting such training data for each customer is prohibitive and sometimes impossible. For several enterprise environments, dictionaries or domain experts might not be available — a barrier of entry preventing widespread adoption of Knowledge Graphs for enterprise.

KILT to drive research

Recently, the research community has been trying to build more efficient KBP systems that require less training effort. For instance, the Facebook AI team has introduced a suite of benchmarks called KILT — Knowledge Intensive Language Tasks — to help boost research.

KILT sports two zero-shot slot filling tasks, Zero Shot RE and T-REx, with the results obtained by competing systems published on a public leaderboard to motivate researchers to keep pushing the limits in building Knowledge Graphs. Zero-shot slot filling is crucial to reduce domain adaptation effort, compared to traditional natural language processing methods.

In the zero-shot approach, the system is not supervised. Instead, it is instructed in pseudo natural language on how to perform the task. For example, to teach the system how to recognize the term “employees,” the system is instructed with the expression “work for” instead of a set of textual occurrences showing the examples for that relation.

The zero-shot approach has the potential to revolutionize the industry, enabling the creation of dynamic knowledge graphs where the schema could be constantly adapted to new business needs at no cost. In other words, the zero-shot approach alleviates the barriers to the adoption of Knowledge Graphs in the enterprise environment.

However, the performance achieved by current zero-shot slot filling systems featuring on the KILT leaderboard are still not satisfactory, with accuracy below 50 percent — making them unusable in real settings.

This is where we come in.

Two methods in one to boost accuracy

Our team’s approach to zero-shot slot filling is a sequence-to-sequence generative method based on a combination of Dense Passage Retrieval (DPR) and Retrieval Augmented Generation (RAG) — and both are trained for slot filling. The source code and model are available at our retrieve-write-slot-filling Github.

IBM’s approach to zero-shot slot filling is a sequence-to-sequence generative method based on a combination of Dense Passage Retrieval (DPR) and Retrieval Augmented Generation (RAG) — and both are trained for slot filling.

DPR uses language models to index text passages with vector representations enabling a semantic search that goes beyond keyword search. RAG is also based on a language model. It uses a sequence-to-sequence approach to translate the set of text passages retrieved by DPR into a slot filler, which represents the answer to the user’s information need.

Conceptually, the DPR component collects and aggregates all the information about an entity, while the RAG component reads and understands that content, focusing on performing the inference needed to predict the slots associated to a specific relation.

Take the query “Alan Turing.” The DPR component first collects the relevant text about that entity, needed to identify his employer, university, language and so on. Then the RAG component reads this text and performs the inference needed to fill the slot. For example, to know the language spoken by Alan Turing, there might not be any explicit mention of that in the corpus. However, looking at his birthplace, London, and the fact that he graduated from King’s College, the RAG model can make the inference that Alan Turing spoke English, with a high confidence.

Helping AI to answer questions better

Then there is question-answering — and there, RAG models can help too.

Our innovation is in how we train the DPR component. We co-train the question encoder to perform well on both search and slot filling tasks at the same time, rather than considering the two steps independently. This way, our model learns how to retrieve information that matters for the relation at hand, focusing on specific aspects and not just generic information about the entity as a typical search engine would do.

For example, if we are looking for the name of Alan Turing’s father, the relevant documents are going to be about his family and not about his research. To implement this idea, we first train the DPR model on the provenance ‘ground truth’ that has been manually annotated for each slot filler in the KILT training data. Then, we train the sequence-to-sequence generation and further train the query encoder using only the target tail entity as the objective.

As for the KILT leaderboard — we are incredibly proud of our results.

The KILT-F1 measure takes into account the accuracy in the prediction of the missing slot and the ability to retrieve the supporting evidence for the prediction. With accuracy close or above 80 percent in both metrics, these results are important because they provide the foundation for building highly adaptable knowledge graph induction solutions.

We are not done yet. Our team is now exploring ways to adapt KGI to new enterprise corpora and domains with minimal effort. The idea is to use the pre-trained model for zero-shot initialization and then fine-tune the system in a few-shot paradigm with a human-in-the-loop strategy to constantly validate the output while using it. This video demo illustrates the human-in-the-loop process.

And we also believe that the combination of DPR and RAG can be well generalized to a larger variety of tasks, be it fact checking, question answering or dialog. We plan to submit solutions for these tasks to the KILT leaderboard and, at the same time, explore ways to better adapt them to enterprise use cases — with our final goal being delighting our customers with the latest developments in AI.

Subscribe to our Future Forward newsletter and stay up to date on the latest research news

Subscribe to our newsletter

References

Glass, M., Rossiello, G., Gliozzo, A. Zero-shot Slot Filling with DPR and RAG. arXiv (2021) ↩

All decisions have trade-offs. IBM’s Wei Sun is an expert at weighing them
Q & A
Kim Martineau
06 Aug 2025
IBM Storage Scale delivers real-world performance: an in-depth analysis
Technical note
Brian Belgodere, Chris Miller, John Lewars, Matthew Klos, Yukio Hayashi Leon, Mara Miranda Bautista, and Olaf Weiser
04 Aug 2025
- AI
- Hybrid Cloud Infrastructure
Debugging LLMs to improve their credibility
Research
Kim Martineau
30 Jul 2025
From simulated steps to real-world care: AI learns how we walk for neurology
Research
Peter Hess
29 Jul 2025

From text to databases

KILT to drive research

Two methods in one to boost accuracy

Helping AI to answer questions better

References

Related posts

All decisions have trade-offs. IBM’s Wei Sun is an expert at weighing them

IBM Storage Scale delivers real-world performance: an in-depth analysis

Debugging LLMs to improve their credibility

From simulated steps to real-world care: AI learns how we walk for neurology