Geospatial Foundation Model by Diversified Data
Abstract
To improve the unsupervised training of geospatial foundation models, we propose a novel approach that prepares diverse and unbiased datasets by maximizing an information entropy of selected geospatial features. Our method involves the extraction of detailed metrics such as temperature and precipitation, which are then organized into clusters based on their similarities. The approach introduces a weighted sampling method that ensures the inclusion of representative data points, it gives preference to less frequent data by counting the number of similar geospatial data points to increase the diversity of the dataset. The result shows that the information entropy value of the proposed method is higher than that of the uniform random method. And the approach significantly improves the accuracy of the geospatial model by providing a balanced representation of the data. Our research highlights the potential benefits of optimizing geospatial data sampling, which can lead to improved model accuracy and expanded practical applications.