A Geospatial Foundation Model with applications to UK and Ireland
Abstract
AI Foundation models that use a self-supervised training approach are being increasingly used for earth observation tasks. They can achieve equal or better performance in downstream tasks compared to other deep learning methods while using less training data. Many training datasets required for tasks such as image segmentation, object detection and regression are hand labelled, and therefore costly to create. Foundation models are thought to reduce the amount of required hand labelled data to maintain similar levels of performance to a model that is trained from scratch on a large amount of data for the specific task. It takes significant resources to train these types of models from scratch. For example, the Prithvi model used in this study took ~4 days using 8 A100 GPUs to train. If we want to add, for example, another channel from optical imagery or use a different satellite for self-supervised training, retraining the model from scratch (i.e. starting with random weights) would use comparable resources again. Ideally, we want to reduce the cost of training the new model by using the weights from a previously trained model as these still contain useful representation of the data. Here we show the development and evaluation of a pre-trained geospatial model using downstream tasks specific to the UK and Ireland, such as flood segmentation and above-ground biomass. We use Prithvi, which is a transformer-based geospatial foundation model trained on multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. This model was initially trained on data from the continental USA, however we perform additional pre-training on data from UK and Ireland to make it specific to our area of interest. We find that including data specific to the UK and Ireland at the pre-training stage resulted in fine-tuning models achieving similar levels of performance in fewer epochs compared to when starting from pre-trained models without data specific to the UK and Ireland. When including additional channels such as Sentinel-1 SAR (VV and VH) or DEM we find that incorporating data specific to the UK and Ireland at the pre-training stage helps improve performance of this model. Finally, we find that incorporating additional channels helps improve performance of downstream fine-tuning tasks.