Empirical relationships between environmental factors and soil organic carbon produce comparable prediction accuracy to machine learning
Abstract
Accurate representation of environmental controllers of soil organic carbon (SOC) stocks in Earth System Model (ESM) land models could reduce uncertainties in future carbon–climate feedback projections. Using empirical relationships between environmental factors and SOC stocks to evaluate land models can help modelers understand prediction biases beyond what can be achieved with the observed SOC stocks alone. In this study, we used 31 observed environmental factors, field SOC observations (n = 6,213) from the continental United States, and two machine learning approaches (random forest [RF] and generalized additive modeling [GAM]) to (a) select important environmental predictors of SOC stocks, (b) derive empirical relationships between environmental factors and SOC stocks, and (c) use the derived relationships to predict SOC stocks and compare the prediction accuracy of simpler model developed with the machine learning predictions. Out of the 31 environmental factors we investigated, 12 were identified as important predictors of SOC stocks by the RF approach. In contrast, the GAM approach identified six (of those 12) environmental factors as important controllers of SOC stocks: potential evapotranspiration, normalized difference vegetation index, soil drainage condition, precipitation, elevation, and net primary productivity. The GAM approach showed minimal SOC predictive importance of the remaining six environmental factors identified by the RF approach. Our derived empirical relations produced comparable prediction accuracy to the GAM and RF approach using only a subset of environmental factors. The empirical relationships we derived using the GAM approach can serve as important benchmarks to evaluate environmental control representations of SOC stocks in ESMs, which could reduce uncertainty in predicting future carbon–climate feedbacks.