A Unified Framework for Generative AI Safety
Pin-Yu Chen
ICML 2025
The focus of this paper is a Bayesian framework for solving a class of problems termed multiagent inverse reinforcement learning (MIRL). Compared to the well-known inverse reinforcement learning (IRL) problem, MIRL is formalized in the context of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We establish a theoretical foundation for competitive two-agent zero-sum MIRL problems and propose a Bayesian solution approach in which the generative model is based on an assumption that the two agents follow a minimax bipolicy. Numerical results are presented comparing the Bayesian MIRL method with two existing methods in the context of an abstract soccer game. Investigation centers on relationships between the extent of prior information and the quality of learned rewards. Results suggest that covariance structure is more important than mean value in reward priors.
Pin-Yu Chen
ICML 2025
Werner Geyer, Jessica He, et al.
CHIWORK 2025
Gentiana Rashiti, Kumudu Geethan Karunaratne, et al.
ECAI 2024
Yuya Jeremy Ong, Jay Pankaj Gala, et al.
IEEE CISOSE 2024