Disconnection Aware Steering of Retrosynthesis Transformer to Facilitate Materials Design

Amol Thakkar; Andrea Antonia Byekwaso; Alain Vaucher; Philippe Schwaller; Alessandra Toniato; Teodoro Laino

MRS Fall Meeting 2022

Talk

27 Nov 2022

Disconnection Aware Steering of Retrosynthesis Transformer to Facilitate Materials Design

Abstract

Retrosynthetic analysis is the task of breaking down a target molecule into its constituent precursors until a set of commercially available building blocks is reached. At each single step in the sequence, the bonds to be changed and/or functional group interconversions are identified, and the molecule broken into hypothetical precursors. Several deep-learning-based approaches to single-step retrosynthesis treat the prediction of possible disconnections as a translation task, relying on the use of the Transformer architecture [1] and the simplified molecular-input line-entry system (SMILES) [2,3] notation [4-7]. Given a target molecule, these approaches suggest the best set of precursors (i.e. reactants, and possibly other reagents) as the translation's outcome, with the possibility to generate multiple such sets.

However, in their current form, retrosynthetic prediction systems offer the chemist little control over the site at which disconnections are made. As such, this work serves to enable user-defined disconnections for single-step retrosynthetic analysis, enabling steering of transformer models for retrosynthetic prediction. Whereas previous models offer no opportunity to steer the model and remain limited in the disconnections they propose. Thus, paving the ground for a ‘human-in-the-loop’ component harnessing both expert knowledge and deep learning. To this end, we have investigated methods to enhance user interaction from tagging input molecules, through to dataset augmentation. We additionally introduce and examine the predictions using several metrics beyond topN accuracy, which serves to build an understanding of how the predictions made align with those expected by chemists. Thus, we take a step towards improving decision-making strategies that statistical and machine learning algorithms cannot yet encode due to a lack of relevant training data. Ultimately this serves to enhance a chemist’s experience by facilitating user engagement.

[1] Vaswani, A. et al.; Advances in neural information processing systems 2017, 5998–6008. [2] Weininger, D.; J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [3] Weininger, D.; Weininger, A.; Weininger, J. L.; J. Chem. Inf. Comput. Sci. 1989, 29, 97–101. [4] Yang, Q. et al.; Chem. Commun. 2019, 55, 12152–12155. [5] Karpov, P.; Godin, G.; and Tetko, I. V.; International Conference on Artificial Neural Networks 2019, 817–830. [6] Duan, H.; Wang, L.; Zhang, C.; Guo, L.; and Li, J.; RSC Adv. 2020, 10, 1371–1378. [7] Schwaller, P. et al.; Chem. Sci. 2020, 11, 3316–3325.

Tutorial