What is the next best drilling decision to make in a Field Development Plan (FDP)? This is the key question we address in this work. A FDP consists of a sequence of decisions. Each action we take affects the reservoir and conditions any future decision. The novelty of our proposed approach is the consideration of the sequential nature of the decisions through the framework of Dynamic Programming (DP) and Reinforcement Learning (RL). In this framework, each scheduled drilling decision depends on the observations acquired between drillings. This methodology allows moving the focus from a static Field Development Plan optimization to a more dynamic framework that we call Field Development Policy Optimization. In addition to the formulation of this new framework, we have applied this methodology to optimize the development of a real oil and gas field. We model the FDP optimization problem under subsurface uncertainty as a Partially Observable Markov Decision Process (POMDP) and solve it through a RL algorithm in order to find an optimal drilling policy. Our methodology works for a general reservoir with a given set of geological model representing. To speed up the learning process we utilize a trained Deep Recurrent Neural Network (RNN) to approximate the reservoir simulator flows, which are subsequently used to compute the economic performance of a drilling policy through its discounted cash flows. The RNN is trained and tested on a set of reservoir simulator runs over randomly sampled realizations of our reservoir model, well location, type and control sequences of a drilling plan. From all the possible decisions involved in a FDP, we focus here only on finding optimal adaptive well drilling schedules (locations of vertical wells and well types). The RL agent learns the best drilling schedule policy by generating simulated episodes of experience and iteratively improving the policy using a Q value function approximated by a neural network trained across episodes. The final solution consists of an adaptive Field Development Plan yielding the highest expected Net Present Value (NPV), computed within a given, available time budget. It specifies an adaptive drilling schedule of producer and injector, well locations and well controls, as a function of the information obtained at each drilling step. The methodology has been applied to an actual reservoir for infill well location decisions. In this case, our objective is finding the best well placement and well type for the next producer and injector wells, as well as the optimization of the control schedule for new and preexisting wells in the reservoir. Our results show the learning progress of our RL algorithm until finding the optimal drilling plan. The robustness of the solution is evaluated across the best-trained policies. Methodology and results have been validated using a brute force sampling approach. Both RL and brute force approaches were possible due to our fast-to-compute RNN approximation of the reservoir simulator. This work represents, to our knowledge, the first application of an end-to-end AI workflow for Field Development Policy Evaluation in real fields, based on Reinforcement Learning and Deep Learning. The proposed methodology puts together an optimal field evaluation in planning and a surveillance workflow for a reactive decision-making.