Using contextual bandits with behavioral constraints for constrained online movie recommendation
Abstract
AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. In many cases the rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online system, based on an extension of the contextual bandits framework, that learns a set of behavioral constraints by observation and uses these constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. In addition, our system can highlight features of the context which are more predicted to be more rewarding and/or are in line with the behavioral constraints. We demonstrate the system by building an interactive interface for an online movie recommendation agent and show that our system is able to act within a set of behavior constraints without significantly degrading overall performance.