Practical contextual bandits for large-scale structured discrete constrained optimization problems

Pavithra Harsha; Chitra Subramanian; Naoki Abe; Shivaram Subramanian; Amadou Ba; Kevin Fernandez; Mau Longinos; Chandra Narayanaswami

KDD 2025

Conference paper

03 Aug 2025

Practical contextual bandits for large-scale structured discrete constrained optimization problems

Abstract

We study contextual bandit algorithms for high-dimensional combinatorial action spaces involving a class of structured discrete constrained optimization. We propose a novel approach based on general-purpose regression oracles, leveraging Inverse Gap Weighting (IGW) for seamless integration within an optimization framework. IGW sampling is efficiently managed by: (a) Column generation reformulation of the underlying Mixed Integer Programming (MIP) model, enabling flexible lower-level predictors, causal coherence, and efficient representation of large action spaces. (b) Diverse solution pool generation to balance the exploration-exploitation trade-off in large action spaces. To address discontinuities in the reward function’s gradient due to optimization constraints, we incorporate a risk-averse phased learning strategy. Through ablation studies, we demonstrate that each component is critical for maximizing cumulative rewards. Finally, we validate the practical viability of our approach on a real-world Auto-Scaling problem in IT automation.

Paper