Slot filling involves completing entity-specific templates with information extracted from text. For example, in the picture above, given an auto insurance claim written in natural language, the slot filling task is to identify relevant information needed by the insurance, such as the model of the vehicle, the parts of the car impacted, and so on.
There are two different types of slot filling tasks. One is document-centric, where the entity is represented by an entire document such as an insurance claim in the example above. The other is entity-centric, where the information about the entity of interest (say, a person or a company) is spread across multiple documents in a large corpus, as illustrated by the picture below.
The task of slot filling is typically done by humans, for example financial analysts collecting information about companies from news, by manually inspecting the information from different sources and filling spreadsheets or databases. This is a tedious and time-consuming task, and often absorbs most of the (very expensive) time of analysts — and a task that could be done by AI.
That’s why, in the past two decades, researchers have been developing such AI-based solutions. The AI explores how often the input entity occurs in the corpus, and then gathers information about its slot fillers from the context. When prompted with an entity and a set of relations, a slot filling system fills out a template and provides the passages which explain the reasons why slots have been filled.
Typically, to build accurate and robust IE analytics for Knowledge Base Population — such as Watson Knowledge Studio — scientists use either strictly supervised approaches that need a large quantity of hand-labelled data or a rule-based system that requires ad-hoc dictionaries and syntactic rules.
But both cases require a considerable effort to adapt machine learning models to a new domain. To train supervised IE analytics, companies have to label a large collection of documents. They have to identify mentions of entities in text, such as “manufacturer” and “model” in the car insurance example above, and relations among them, such as “has-property.”
Also, to achieve good accuracy, the system often needs a vast amount of entity names, such as a list of all possible car models and manufacturers from a pre-existing database. Collecting such training data for each customer is prohibitive and sometimes impossible. For several enterprise environments, dictionaries or domain experts might not be available — a barrier of entry preventing widespread adoption of Knowledge Graphs for enterprise.
Recently, the research community has been trying to build more efficient KBP systems that require less training effort. For instance, the Facebook AI team has introduced a suite of benchmarks called KILT — Knowledge Intensive Language Tasks — to help boost research.
KILT sports two zero-shot slot filling tasks, Zero Shot RE and T-REx, with the results obtained by competing systems published on a public leaderboard to motivate researchers to keep pushing the limits in building Knowledge Graphs. Zero-shot slot filling is crucial to reduce domain adaptation effort, compared to traditional natural language processing methods.
In the zero-shot approach, the system is not supervised. Instead, it is instructed in pseudo natural language on how to perform the task. For example, to teach the system how to recognize the term “employees,” the system is instructed with the expression “work for” instead of a set of textual occurrences showing the examples for that relation.
The zero-shot approach has the potential to revolutionize the industry, enabling the creation of dynamic knowledge graphs where the schema could be constantly adapted to new business needs at no cost. In other words, the zero-shot approach alleviates the barriers to the adoption of Knowledge Graphs in the enterprise environment.
However, the performance achieved by current zero-shot slot filling systems featuring on the KILT leaderboard are still not satisfactory, with accuracy below 50 percent — making them unusable in real settings.
This is where we come in.
Our team’s approach to zero-shot slot filling is a sequence-to-sequence generative method based on a combination of Dense Passage Retrieval (DPR) and Retrieval Augmented Generation (RAG) — and both are trained for slot filling. The source code and model are available at our retrieve-write-slot-filling Github.
DPR uses language models to index text passages with vector representations enabling a semantic search that goes beyond keyword search. RAG is also based on a language model. It uses a sequence-to-sequence approach to translate the set of text passages retrieved by DPR into a slot filler, which represents the answer to the user’s information need.
Conceptually, the DPR component collects and aggregates all the information about an entity, while the RAG component reads and understands that content, focusing on performing the inference needed to predict the slots associated to a specific relation.
Take the query “Alan Turing.” The DPR component first collects the relevant text about that entity, needed to identify his employer, university, language and so on. Then the RAG component reads this text and performs the inference needed to fill the slot. For example, to know the language spoken by Alan Turing, there might not be any explicit mention of that in the corpus. However, looking at his birthplace, London, and the fact that he graduated from King’s College, the RAG model can make the inference that Alan Turing spoke English, with a high confidence.
Then there is question-answering — and there, RAG models can help too.
Our innovation is in how we train the DPR component. We co-train the question encoder to perform well on both search and slot filling tasks at the same time, rather than considering the two steps independently. This way, our model learns how to retrieve information that matters for the relation at hand, focusing on specific aspects and not just generic information about the entity as a typical search engine would do.
For example, if we are looking for the name of Alan Turing’s father, the relevant documents are going to be about his family and not about his research. To implement this idea, we first train the DPR model on the provenance ‘ground truth’ that has been manually annotated for each slot filler in the KILT training data. Then, we train the sequence-to-sequence generation and further train the query encoder using only the target tail entity as the objective.
As for the KILT leaderboard — we are incredibly proud of our results.
The KILT-F1 measure takes into account the accuracy in the prediction of the missing slot and the ability to retrieve the supporting evidence for the prediction. With accuracy close or above 80 percent in both metrics, these results are important because they provide the foundation for building highly adaptable knowledge graph induction solutions.
We are not done yet. Our team is now exploring ways to adapt KGI to new enterprise corpora and domains with minimal effort. The idea is to use the pre-trained model for zero-shot initialization and then fine-tune the system in a few-shot paradigm with a human-in-the-loop strategy to constantly validate the output while using it. This video demo illustrates the human-in-the-loop process.
And we also believe that the combination of DPR and RAG can be well generalized to a larger variety of tasks, be it fact checking, question answering or dialog. We plan to submit solutions for these tasks to the KILT leaderboard and, at the same time, explore ways to better adapt them to enterprise use cases — with our final goal being delighting our customers with the latest developments in AI.
Glass, M., Rossiello, G., Gliozzo, A. Zero-shot Slot Filling with DPR and RAG. arXiv (2021) ↩