ACS Spring 2024

Advancing Molecular Property Prediction through Multi-view Latent Space Fusion


Pre-trained Language Models are emerging as promising tools for predicting molecular properties, but their development is still in its early stages, demanding further research to enhance their effectiveness and tackle challenges like generalization and sample efficiency. In this paper, we introduce a novel multi-view approach that leverages latent spaces derived from cutting-edge chemical models. Our approach hinges on two key components: embeddings from MHG-GNN, which represent molecular structures as graphs, and MoLFormer-base embeddings grounded in chemical language. MHG-GNN has been pre-trained on a dataset of 1.4 million molecules selected from PubChem, while MoLFormer-base is a small version of MoLFormer. In this study, we showcase the remarkable performance of our proposed fusion of latent spaces from different origins compared to existing state-of-the-art methods, including MoLFormer-XL, which was pre-trained on 1.1 billion molecules from PubChem and ZINC. Our approach particularly excels in complex tasks such as predicting clinical trial drug toxicity and inhibiting HIV replication. We evaluate our method on six benchmark datasets from MoleculeNet and outperform competitors in five of them. Our research underscores the potential of latent space fusion and feature integration in advancing molecular property prediction. Furthermore, it also paves the way for further enhancements when applied to larger-scale datasets, opening up new avenues for exploration and innovation in this domain.