Agriculture involves a series of critical, interdependent decisions or actions in a complex and highly uncertain environment, with distinct spatial and temporal variations. Actions such as crop selection, planting and harvesting scheduling, and application of nutrients and irrigation are critical to maximizing yields. Complexities are exacerbated by a changing climate and the need to minimize environmental impacts, while achieving global food security in face of increasing populations. The ability of machine learning to efficiently interrogate complex, nonlinear, and high-dimensional datasets can revolutionize decision making in agriculture. This paper describes a reinforcement learning framework that teaches a decision making agent the optimal reward trajectory within the virtual environment. The environment incorporates dynamical representations of economic costs, environmental impacts, and crop yields, and produces a reward that characterizes the effect of different agent actions. The framework aims to identify the set of actions (or agriculture decisions) that maximizes reward. We consider crop management as an optimization problem where the objective is to produce higher crop yield while minimizing the use of external farming inputs (e.g., fertilizer amounts), and reducing greenhouse gas emissions. This is naturally subject to environmental factors like soil moisture, humidity, and temperature. Controlling for these impacts ultimately contribute to a reduction in the carbon footprint of the entire process, which is central to sustainable agriculture. The approach was demonstrated on a case study application in Texas, USA. Training datasets were generated from 20 years of simulated data using the Soil & Water Assessment Tool (SWAT). SWAT model analysis provide multi-year field-scale forecasts of key variables such as soil moisture and water balance, nutrient loading and fertilizer applications, and plant growth and yield forecasting. The proposed framework uses reinforcement learning to autonomously explore and learn the optimal set of actions to direct crop growth cognizant of spatial and temporal variations.