ALCHIMIA - Advanced Learning for Chemistry Interpretation and Integrated Molecule Analysis
Abstract
The current application of foundation models (FMs) in industrial chemical problems, such as the generation and prediction of properties of small molecules, has shown promising results. A key advantage of FM technology is the ability to create a single model using a large amount of pre-training data, which can then be adapted for various downstream tasks using smaller datasets. However, the complexity of working with FM technology, which requires specialized knowledge in AI and expensive hardware, makes it difficult for experts in the chemical domain to access and utilize these models. Moreover, the lack of uncertainty characterization in most models limits their practical use. To address these challenges, we propose a comprehensive pipeline that enables material discovery experts to create machine-learning models based on advanced FM technology. Our pipeline and software stack, built using Python, encapsulate FM technology and provide experts with the ability to fine-tune models using state-of-the-art techniques such as adapters and mixture of experts (MoE). For example, our pipeline allows experts to choose from four different models based on SMILES mixing and fine-tune them using low rank approximation techniques. The entire process is recorded, and uncertainty characterization is calculated for the fine-tuned models. Our proposed pipeline and software stack aim to make FM technology more accessible to experts in the chemical domain, enabling them to leverage the power of these models for material discovery and other applications. By providing a user-friendly interface and advanced fine-tuning techniques, we hope to democratize the use of FM technology and drive innovation in the field of chemistry.