Recently, there has been a great effort around the use of machine learning (ML) models to improve sub-seasonal to seasonal (S2S) forecasts. A prime example is the World Meteorological Organization’s (WMO) open prize challenge to use AI to improve the skill of dynamical S2S forecasts for 2-meter temperature and accumulative precipitation. Attention-based transformer models are currently the state-of-the-art ML techniques for many tasks, including time-series prediction tasks. However, to the Authors’ knowledge have not been applied to improve S2S forecasting. Here, we present a transformer-based approach for forecasting the quantiles of daily 2-meter temperature up to 46 days ahead using a modified version of the temporal fusion transformer (TFT) architecture. The primary motivation behind the use of the TFT is to encode multi-horizon sequence information, namely historical data and forecast outlooks. The TFT can also handle static data such as spatiotemporal features (e.g., location, climate modes, etc.), thus, it provides a ready-to-use architecture for performing hybrid “statistical-dynamical” modeling. In this modified version of the TFT, we include embeddings of ECMWF ensemble members into the LSTM decoder structure so the model can learn a “correction” of the provided forecasts towards the historical observations during training. We evaluate TFT’s predictions against two baselines: recent climatology and the calibrated ECMWF S2S ensemble forecast. Our results show that the TFT predictions outperform calibrated ECMWF and recent climatology according to the Continuous Ranked Probability Score (CRPS) metric. The TFT also responds positively to departures from climatology that other models cannot. Finally, we conduct an AI explainability study to highlight the model’s main sources of predictability, and suggest next steps to improve the proposed model.