PrimeQA

An efficient toolkit for state-of-the-art models on multi lingual open domain question answering.

Overview

The PrimeQA project represents a significant advancement in the field of open-domain question-answering (QA) systems, transcending the capabilities of traditional Information Retrieval (IR) systems.

While IR systems typically return a document or snippets of text in response to a query, the addition of a Machine Reading Comprehension (MRC) component allows the system to pinpoint and return an exact answer span. This project focuses on facilitating multilingual open-domain question-answering, where a query in one language can be answered using resources in another.

Despite the rapid advancements in this field, the adoption of state-of-the-art models has been hindered by reproducibility and ease of use, which PrimeQA aims to address.

PrimeQA is a comprehensive toolkit that integrates both the "retriever" and "reader" components essential for open retrieval QA systems. It offers a simple Python codebase for training and inference in QA problems, boasting top positions in various leaderboards and benchmarks. The toolkit includes pre-trained models available on the HuggingFace model hub and services via Docker Hub, facilitating the development of personalized QA search engines.

The user-friendly toolkit allows individuals to employ state-of-the-art readers with only a few lines of code. It also supports multi-modal QA, including table question answering, and offers capabilities for domain adaptation by generating "synthetic" questions based on target domain documents. This feature extends to various contexts, including tables, text, and hybrid contexts, and can be utilized with minimal coding.

PrimeQA is a collaborative effort, with contributions from several renowned institutions and labs, and continues to evolve, with many other capabilities in development.

Technical Resources

Main github

Services github

Model hub

Publications

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions
- - Scott McCarley
  - Mihaela Bornea
  - et al.
- 2023
- AAAI 2023
PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
- - Avi Sil
  - Jaydeep Sen
  - et al.
- 2023
- arXiv
Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering
- - Arafat Sultan
  - Avi Sil
  - et al.
- 2022
- EMNLP 2022
Towards Robust Neural Retrieval Models with Synthetic Pre-Training
- - Revanth Reddy
  - Arafat Sultan
  - et al.
- 2022
- SIGIR 2022
Learning Cross-Lingual IR from an English Retriever
- - Yulong Li
  - Martin Franz
  - et al.
- 2022
- NAACL 2022
Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval
- - Revanth Gangi Reddy
  - Md Arafat Sultan
  - et al.
- 2022
- SIGIR 2022
Topic Transferable Table Question Answering
- - Saneem Chemmengath
  - Vishwajeet Kumar
  - et al.
- 2021
- EMNLP 2021

Overview

Technical Resources

Publications

GAAMA 2.0: An Integrated System that Answers Boolean and Extractive Questions

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering

Towards Robust Neural Retrieval Models with Synthetic Pre-Training

Learning Cross-Lingual IR from an English Retriever

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval

Topic Transferable Table Question Answering