About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
COLM 2024
Conference paper
Large Language Model Routing with Benchmark Datasets
Abstract
The number of open-source Large Language Models (LLMs) grows daily, as does the number of available benchmark datasets used to evaluate LLMs. While some models dominate these benchmarks, no single model achieves the best accuracy in all tasks and use cases. In light of this observation, we address the challenge of selecting the best LLM from a collection of pre-trained models, given a new task. While related work relies on evaluating each candidate model on a set of labeled examples, our new formulation does not assume any labeled data from the new task is available. Instead, we repurpose a collection of benchmark datasets---which may focus on different tasks than the one at hand---to learn a ''router'' model for LLM selection from inputs only; this problem reduces to a collection of binary classification tasks. Empirically, our strategy consistently improves performance over using any single model for all tasks.