Prithvi WxC: A Weather and Climate Foundation Model

Johannes Schmude; Sujit Roy; Johannes Jakubik; Daniel Salles Civitarese; Shraddha Singh; W. Trojak; Christopher Phillips; Ankur Kumar; Rajat Shinde; Vishal Gaur; Amy Lin; Tsengdar Lee; Manil Maskey; Rahul Ramachandran; Campbell Watson; Juan Moreno

AGU 2024

Talk

09 Dec 2024

Prithvi WxC: A Weather and Climate Foundation Model

Abstract

Deep learning is progressively revolutionizing weather applications by generating highly accurate forecasts with lower computational costs compared to numerical weather prediction. Unlike traditional physics-based methods, deep learning models for weather applications do not directly simulate the underlying physics. The models, instead, tend to learn high-level features via probability distributions to explain the relationship across multiple variables. We present Prithvi WxC, a foundation model for weather applications. The model was trained using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). This widely-used reanalysis dataset from NASA offers global atmospheric data from 1980 to the present day with a spatial resolution of 0.625 degrees and a temporal resolution of 3 hours. Prithvi WxC is a transformer-based deep learning architecture that integrates concepts from various recent transformer models to effectively handle both regional and global dependencies in the input data and efficiently manage longer token sequence lengths. This capability allows the model to incorporate additional tokens from off-grid measurements during fine-tuning, enhancing its performance and accuracy. Additionally, we are investigating the effects of scaling Prithvi WxC to larger parameter counts to assess if increased model size enhances its capabilities. We achieve this by partitioning the data using the fully-sharded data parallel (FSDP) framework and training the model across numerous GPUs for several days on NASA Advanced Supercomputing (NAS) clusters. This experimentation with model scaling helps us explore the balance between a greater number of parameters and larger batch sizes during pre-training.

Poster