About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SIAM CT 2023
Conference paper
General Markov Decision Process Framework for Directly Learning Optimal Control Policies
Abstract
We consider a new form of decision making under uncertainty that is based on a general Markov decision process (MDP) framework devised to support opportunities to directly learn the optimal control policy. Our MDP framework extends the classical Bellman operator and optimality criteria by generalizing the definition and scope of a policy for any given state. We establish convergence and optimality results-both in general and within various control paradigms (e.g., piecewise linear control policies)-for our control-based methods through this general MDP framework, including convergence of Q-learning within the context of our MDP framework.