Qian Huang, George C. Stockman
ICPR 1994
Accurate multi-modal document retrieval iscrucial for Retrieval-Augmented Generation(RAG), yet existing benchmarks do not fullycapture real-world challenges with their currentdesign. We introduce REAL-MM-RAG, an au-tomatically generated benchmark designed toaddress four key properties essential for real-world retrieval: (i) multi-modal documents, (ii)enhanced difficulty, (iii) Realistic-RAG queriesand (iv) accurate labeling. Additionally, wepropose a multi-difficulty-level scheme basedon query rephrasing to evaluate models’ seman-tic understanding beyond keyword matching.Our benchmark reveals significant model weak-nesses, particularly in handling table-heavydocuments and robustness to query rephras-ing. To mitigate these shortcomings, we cu-rate a rephrased training set and introduce anew finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models toachieve state-of-the-art retrieval performanceon REAL-MM-RAG benchmark. Our workoffers a better way to evaluate and improve re-trieval in multi-modal RAG systems while alsoproviding training data and models that addresscurrent limitations. Our benchmark is availableat this project page.
Qian Huang, George C. Stockman
ICPR 1994
Hisashi Kashima, Tsuyoshi Id́e, et al.
IEICE Transactions on Information and Systems
James E. Gentile, Nalini Ratha, et al.
BTAS 2009
Lina Berrayana, Sean Rooney, et al.
ACL 2025