Platform for Polymer Inverse-Design by Graph Generative Algorithms
Abstract
1. Introduction Designing “tailored” material that possesses desired characteristics is a long-awaited technology. Especially molecular design is important across broad industrial domains. It is to solve a complex combinatorial problem in terms of molecule’s atomistic configuration patterns, that significantly exceeds 1060. In this extra-vast parameter space, human SMEs (Subject-MatterExperts) can only explore around a tiny space by trial-and-error relying on their experience and knowledge. To accelerate their design speed and expand the structural variety of molecules, AI can play a key role. Several AI molecular design tools and algorithms exist [1,2], but following issues in practical use are unaddressed; (1) deep learning - based methods rely on extremely large amount of material data, (2) models are black box, therefore hard to tune for chemists, (3) customization for specific material domains is not supported, (4) a certain level of IT skill is required to use. To address those issues, we developed a platform for small organic molecules [3]. In this paper, we exhibit customization of the tool for polymer domains and demonstrate to design new acrylic polymers having specific Tg (glass transition temperature). 2. Method The fundamental workflow of our tool consists of sequential 5 steps; (1) data input, (2) feature encoding, (3) property prediction, (4) solution search, and (5) structure generation. First, a data set listing pairs of chemical structures and target properties is input as a training dataset. Structures are encoded to a set of feature vectors, by which a regression model to predict target properties is built. After the model is built, a user inputs target property values. The system performs the solution search to identify candidate feature vectors that satisfy the target property values. Finally, the feature vectors are decoded to concrete molecular structures by improved Mckay’s Canonical Costruction Path algorithm, that builds up a molecular graph by connecting atoms and substructures indicated in the feature vector. That process is called “inversedesign”; to solve a model inversely starting from target properties. We further customized the tool for polymer domain; first, the tool identifies monomer’s main chain and extract only side chain structures as a part to design. Second, structure generation algorithm is reinforced by structural constraints to avoid chemically unrealistic structures. 3. Results To demonstrate inverse-design of polymers, from PolyInfo we extracted acrylic polymer data consisting of 378 pairs of SMILES and Tg. Side chain structures are extracted from the acrylic main chain, and encoded to about 50 dimensional feature vectors, in which each element represents a number of substructures. Regressions were carried out on the feature vectors to predict Tg. The best regression model exhibited determination coefficient R2 ~ 0.68 corresponding to RMSE~ 30 C°. Inverse-design was carried out on the model targeting Tg ~ 150 C°. Running several hours, more than 100 structures were generated, that is significant acceleration in design speed compared with typical SMEs. The generated structures were reviewed by a polymer SME for screening, and some of the candidates were successfully synthesized in the lab. Platform in Service The above platform is running on the cloud of IBM. They are provided as-a-service to our client companies for research purpose. Two interfaces; Python-based command line interface and GUI-based web application are provided to cover broad range of user’s IT-skills. In the presentation, we will also demonstrate the web application. Reference [1] R.G.Bombarelli, et al. “Automatic chemical design using a data-driven continuous representation of molecules”, 2018 [2] G.B.Goh, et al. “Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction”, 2018 [3] S.Takeda, et al. “Molecular Inverse-Design Platform for Material Industries”, 2020