Talk

Multimodal foundation model for aligning smiles representations with DFT electron density grids

Abstract

We introduce SMIDFT-CLIP, a pre-trained foundation model designed to align SMILES representations with 3D electron density grids generated by Density Functional Theory (DFT) simulations. This alignment is achieved in a shared multimodal latent space through contrastive learning, providing a novel approach to integrate molecular structural data with quantum mechanical properties. The framework is built around two primary components: a transformer-based encoder to process SMILES strings and a 3D Vector Quantized Generative Adversarial Network (3D VQ-GAN) to encode electron density grids. By merging these two distinct molecular representations, SMIDFT-CLIP creates a unified latent space capable of encoding both structural and quantum mechanical information.

The architecture of SMIDFT-CLIP is designed to capture complex molecular features, allowing the model to seamlessly integrate high-level structural information encoded in SMILES with quantum mechanical insights derived from DFT. The SMILES encoder learns detailed representations of molecular graphs and chemical structures, while the 3D VQ-GAN compresses electron density grids into a format suitable for alignment within the latent space. SMIDFT-CLIP model was pre-trained on the Polaris cluster at the Argonne Leadership Computing Facility, utilizing 600 NVIDIA V100 GPUs.

Preliminary experiments demonstrate the effectiveness of SMIDFT-CLIP in constructing a robust multimodal latent space. This latent space enables the fusion of molecular and quantum data, improving the model's performance in tasks such as molecular property prediction. In retrieval tasks, SMIDFT-CLIP achieved top-k@1, top-k@3, and top-k@6 accuracies, with a best-case accuracy of 94%, demonstrating its ability to retrieve accurate molecular properties. Aligning SMILES representations with DFT-derived electron density grids not only preserves essential chemical information but also enriches it with quantum mechanical data, which proves beneficial for downstream applications like predicting molecular reactivity, stability, and other key properties. SMIDFT-CLIP also opens new opportunities for generating complex molecular representations, such as 3D electron density grids, based on computationally inexpensive SMILES data. This capability has the potential to significantly reduce the computational cost of obtaining high-fidelity quantum mechanical data, offering a practical alternative to time-consuming DFT simulations.

Related