An efficient toolkit for state-of-the-art models on multi lingual open domain question answering.


The PrimeQA project represents a significant advancement in the field of open-domain question-answering (QA) systems, transcending the capabilities of traditional Information Retrieval (IR) systems.

While IR systems typically return a document or snippets of text in response to a query, the addition of a Machine Reading Comprehension (MRC) component allows the system to pinpoint and return an exact answer span. This project focuses on facilitating multilingual open-domain question-answering, where a query in one language can be answered using resources in another.

Despite the rapid advancements in this field, the adoption of state-of-the-art models has been hindered by reproducibility and ease of use, which PrimeQA aims to address.

PrimeQA is a comprehensive toolkit that integrates both the "retriever" and "reader" components essential for open retrieval QA systems. It offers a simple Python codebase for training and inference in QA problems, boasting top positions in various leaderboards and benchmarks. The toolkit includes pre-trained models available on the HuggingFace model hub and services via Docker Hub, facilitating the development of personalized QA search engines.

The user-friendly toolkit allows individuals to employ state-of-the-art readers with only a few lines of code. It also supports multi-modal QA, including table question answering, and offers capabilities for domain adaptation by generating "synthetic" questions based on target domain documents. This feature extends to various contexts, including tables, text, and hybrid contexts, and can be utilized with minimal coding.

PrimeQA is a collaborative effort, with contributions from several renowned institutions and labs, and continues to evolve, with many other capabilities in development.

Technical Resources

Main github

Services github

Model hub