Multi-View Mixture-of-Experts for Predicting Molecular Properties Using SMILES, SELFIES, and Graph-Based Representations
Abstract
Recent advancements in chemical machine learning have adopted a two-step approach—pre-training on unlabeled data followed by fine-tuning on specific tasks—to boost model capacity. With the increasing demand for training efficiency, Mixture-of-Experts (MoE) has become essential for scaling large models by selectively activating sub-networks of experts through a gating network, thereby optimizing performance.This paper presents MoL-MoE, a Multi-view Mixture-of-Experts framework designed to predict molecular properties by integrating latent spaces derived from SMILES, SELFIES, and molecular graphs. Our approach leverages the complementary strengths of these representations to enhance predictive accuracy. Here, we evaluate the performance of MoL-MoE with a total of 12 experts, organized into 4 experts for each modality (SMILES, SELFIES, and molecular graphs). We evaluate MoL-MoE on a range of benchmark datasets from MoleculeNet, demonstrating its superior performance compared to state-of-the-art methods across all nine datasets considering two different routing activation settings: k=4 and k=6. The results underscore the model's robustness and adaptability in handling various complex molecular prediction tasks. Our analysis of routing activation patterns reveals that MoL-MoE dynamically adjusts its use of different molecular representations based on task-specific requirements. This adaptability highlights the importance of representation choice in optimizing model performance.