ACS Fall 2022
Conference paper

Deep learning assisted Suzuki cross coupling catalyst design


The need for more efficient and sustainable catalysts is ever-growing, and so are the costs associated with experimentally searching the chemical space to find a new promising catalyst. Computational methods, such as DFT, allow for screening molecules virtually for their suitability. B. Meyer et al. showed that the binding energy calculation between a ligand molecule and a metal center can be used as an indicator for testing ligand suitability for the Suzuki cross-coupling reaction [1]. We present a state-of-the-art deep learning model able to predict the binding energy solely from string representations of catalysts and to generate new potential suitable ligand candidates, by self-learning meaningful features of these catalysts and learn to predict these DFT-calculated energies from these features. In a first step, inspired by the work of Gomez-Bombarelli et al [2], an RNN-based Variational Autoencoder (VAE) is trained, by teaching an encoder neural network to compress the catalyst representation (SMILES or SELFIES) into a continuous latent space while a second neural network, the decoder, reconstructs the original representation from the latent space. This reconstruction from the lower-dimensional latent space leads to a meaningful condensed representation of the inputted molecule. Additionally, a feed forward neural network is trained to predict the associated binding energy from the latent space, which allows the model to organize the latent space. These trained models achieve state of the art predictive performances (MAE = 2.35 kcal mol-1) over previously-reported machine-learning approaches representing catalysts as bag of bonds (2.73 kcal mol-1) and coulomb matrixes (MAE = 3.05 kcal mol-1)[1]. The main advantage of our approach is the following: in addition to the possibility to predict the binding energy for catalysts assembled in a combinatorial fashion, this continuous and low-dimensional self-learned latent representation can be searched by gradient-based optimization to generate new molecules. After the optimization finds a latent vector associated with promising binding energy, one can decode the latent space back to a molecule representation using the trained decoder. Overall, this approach shows a promising new way to design new catalysts that can be adapted to different reaction classes.