Talk

Mol-Moe: A multi-view mixture-of-experts framework for molecular property prediction with SMILES, SELFIES, and graph representations

Abstract

Recent advancements in chemical machine learning have led to a significant shift toward multi-stage model training strategies, wherein models are pre-trained on vast unlabeled datasets and subsequently fine-tuned on domain-specific tasks. This paradigm enhances model capacity, particularly for complex tasks such as molecular property prediction. As the demand for computational efficiency grows, Mixture-of-Experts (MoE) frameworks have gained prominence. MoE offers a scalable solution by activating only a subset of specialized expert networks during inference, guided by a gating mechanism. This selective activation improves both computational efficiency and predictive performance, particularly in large-scale molecular prediction tasks.

In this paper, we introduce MoL-MoE, a novel Multi-view Mixture-of-Experts framework tailored for molecular property prediction. Unlike traditional single-representation models, MoL-MoE incorporates three distinct molecular representations—SMILES, SELFIES, and graph-based representations—each of which captures different aspects of molecular structure. By fusing latent spaces from these complementary views, MoL-MoE leverages the strengths of each modality to deliver enhanced predictive accuracy.

Our MoL-MoE framework consists of 12 experts, partitioned into 4 experts per molecular representation (SMILES, SELFIES, and molecular graphs). A gating network determines which experts to activate for a given task, dynamically routing inputs to the most relevant experts based on task-specific features. This design allows the model to adapt to various molecular prediction challenges, optimizing resource allocation while maximizing performance.

To validate the efficacy of our approach, we conducted extensive experiments using benchmark datasets from MoleculeNet, a well-established suite of datasets for molecular machine learning. We evaluated MoL-MoE across nine datasets that encompass a wide range of molecular properties, including physical, quantum-mechanical, and bioactivity-related features. The model was assessed under two routing configurations, k=4 and k=6, where k represents the number of experts activated for each task. Our results demonstrate that MoL-MoE consistently outperforms state-of-the-art models across all benchmark datasets, achieving substantial improvements in both predictive accuracy and computational efficiency.

Related