Foundation Model for Reconstruction of Missing Data in TROPOMI Sentinel-5P Greenhouse Data Products

Gabby Nyirjesy; Maciel Zortea; Levente Klein; Kamal Das; Leonardo P. Tizzei; Ildar Khabibrakhmanov; Theodore Van Kessel; Joao Lucas de Sousa Almeida

AGU 2025

Poster

15 Dec 2025

Foundation Model for Reconstruction of Missing Data in TROPOMI Sentinel-5P Greenhouse Data Products

View publication

Abstract

Self-supervised learning has shown success for many different modalities of data, from text to images to satellite data. This type of learning allows the model to understand the underlying patterns and context of the data so that it can speed up fine-tuning of other models with minimal compute. We have applied self-supervised learning to TROPOMI Sentinel-5P data products for greenhouse gases, such as carbon monoxide (CO) and methane (CH4), to understand the challenges encountered when performing masked reconstruction of very sparse datasets, with sometimes over 90% of pixels missing due to various factors like cloud cover, low- and high-albedo scenes, etc. We experiment with a masked auto encoder architecture and a graph neural network to perform this data reconstruction. These models can take an incomplete Sentinel-5P input image product and reconstruct a realistic prediction of what the missing data would look like. This complete greenhouse data cube can then be fed into an inversion model to estimate flux and identify hot spots and leaks. We experiment with incorporating High resolution rapid refresh (HRRR) wind data to capture some of the dispersion dynamics and better improve this reconstruction. We show our results and present a strategy for dealing with reconstruction in a highly sparse dataset. Finally, we explore conformal prediction to model uncertainty within these predictions to create a confidence interval for the concentration at each pixel.

Paper