Publication
AGU 2024
Talk

Advancing Total Suspended Solids Modeling Using Machine Learning and Remote Sensing Data

Abstract

This study enhances water quality assessment by predicting Total Suspended Solids (TSS) using remote sensing, vital for monitoring ecosystems, human health. Using the AquaSat dataset, with multispectral satellite and ground-level sensor data, machine learning models were developed to predict TSS levels across U.S. regions. Model inputs include LANDSAT spectral observations (blue, green, red, nir, and swir) and the pwater variable. Performance was evaluated across various LANDSAT series and seasons. Random forest regressor (RFR), Linear regression (LR), Gradient Boost Regressor (GBR), Support Vector Regressor (SVR), Decision tree regressor (DTR) model are developed and validated with 5-fold cross-validation on three non-overlapping bounding boxes (named BBX-1, 2 and 3) containing dense samples across U.S. DTRBBx-1 (DTR model for BBx-1) achieved R² of 0.641, MAE of 7.482 mg/L. For LANDSAT samples, GBRLANDSAT-5 (R² of 0.6) and GBRLANDSAT-8 (R² of 0.54) model achieved best results but for GBRLANDSAT-7 performed R² of 0.29. The best season-specific models GBRSpring, RFRSummer, RFRFall and GBRWinter achieved R2 values of 0.48, 0.06, 0.52 and 0.52 respectively on BBx-1. Having highest data coverage for BBx-2, the RFRBBx-2 model shows strong performance with an R² of 0.449, MAE of 9.11 mg/L. For BBx-2 and 3, the RFR model's R² values ranged from 0.4 to 0.41 across all seasons. RFRLANDSAT-8 was most effective on BBx-2 in predicting TSS, (R² of 0.43), outperforming models for LANDSAT 5 and 7, highlighting differences in downscaling TSS with spectral samples from different LANDSAT series. RFRBBx-3 shows consistent performance (R²= 0.396, MAE = 5.791 mg/L). For lake bodies within BBx-1, BBx-2, and BBx-3, RFR R² values were 0.37, 0.64, and 0.51, respectively. For the combined BBX-1, 2, and 3 dataset, RFR performed best with R² values of 0.42 for LANDSAT-8, 0.30 for LANDSAT-7, and 0.35 for LANDSAT-5 data. Seasonally, the RFRWinter (R² = 0.41) and RFRSpring (R² = 0.37) model showed moderate predictive capability, while those for Fall and Summer were less accurate. This study examines the performance of various machine learning models on the AquaSat dataset, noting the comparative differences in seasonal models and inter-LANDSAT samples, emphasizing the necessity of a robust dataset to develop accurate TSS prediction models.