About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
KDD 2024
Tutorial
A Tutorial on Multi-Armed Bandit Applications for Large Language Models
Abstract
This tutorial offers a comprehensive guide on using multi-armed bandit (MAB) algorithms to improve Large Language Models (LLMs). As Natural Language Processing (NLP) tasks grow, efficient and adaptive language generation systems are increasingly needed. MAB algorithms, which balance exploration and exploitation under uncertainty, are promising for enhancing LLMs. The tutorial covers foundational MAB concepts, including the exploration-exploitation trade-off and strategies like epsilon-greedy, UCB (Upper Confidence Bound), and Thompson Sampling. It then explores integrating MAB with LLMs, focusing on designing architectures that treat text generation options as arms in a bandit problem. Practical aspects like reward design, exploration policies, and scalability are discussed. Real-world case studies demonstrate the benefits of MAB-augmented LLMs in content recommendation, dialogue generation, and personalized content creation, showing how these techniques improve relevance, diversity, and user engagement.