Heterogeneous Multitask Learning Across the Chemical Space
Abstract
Scaling foundation models for natural science has the promise to result in new discoveries, such as new materials. For this objective, we developed a multitask learning framework that integrates chemically diverse datasets and heterogeneous tasks. We use an equivariant encoder to learn a shared latent representation which is processed by subsequent task-specific decoders to predict properties of interest beyond atomic energies and forces. By training a single model on a combined set of crystalline materials and small molecules, we demonstrate that concurrent training on multiple modalities can lead to comparable performance across tasks. This approach can be readily applied to arbitrary datasets and tasks, without requiring target homogeneity, and paves the way towards scaling multitask foundation models across the chemical space.