About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MRS Spring Meeting 2023
Talk
Enzyme Optimization via a Generative Language Modeling-Based Evolutionary Algorithm
Abstract
Enzymes are molecular engines that nature has designed to allow otherwise impossible chemical reactions to take place. They present exceptional properties, making them appealing for more sustainable reactions: mild conditions, fewer toxic solvents, and less waste. Billion years of evolution have made enzymes extraordinarily efficient. However, widespread application in industrial processes necessitates speedier design employing in-silico approaches, a demanding endeavor far from being completed. Most approaches act by introducing mutations into an existing amino acid (AA) sequence, employing various assumptions and methodologies. Machine learning and deep generative networks have lately gained attention within the protein engineering community. Especially their extensions that exploit preexisting information on protein binders, physico-chemical characteristics, or 3D structures. We treat the problem of enzyme optimization as an evolutionary process, where we model mutations via a generalized autoregressive language model trained on fragments of AA sequences from UniProtKB. We use transfer learning to drive the optimization process and train a Random Forest as the scoring model on a dataset of biocatalyzed chemical processes using pre-trained molecular representations. By doing this, we can alter active sites to catalyze novel reactions by making minimal assumptions. Our approach enables the creation of enzymes with better anticipated biocatalytic activity, simulating the natural evolutionary process by picking ideal sequences that reflect the underlying proteomic language.