ACS Fall 2021

On the automation of de-novo molecular design and chemical synthesis planning: A case study on SARS-CoV-2

View code


Accelerated discovery of novel chemical structures with desired properties is a grand challenge of our time. Following the recent progress in chemocentric approaches for generative chemistry, one current challenge is to build multimodal conditional generative models that leverage disparate knowledge sources when mapping biochemical properties to target structures. Bridging chemoinformatics and systems biology, we devise a reinforcement learning method for de novo molecular design directly from biological data such as target proteins or gene expression profiles. As a case study on this matter, we present a pipeline to automatize the process of ligand discovery preceding the chemical synthesis in the lab. Focusing on the discovery of potential SARS-CoV-2 antivirals, we integrate deep learning models for 1) virtual drug screening, 2) conditional de-novo molecular design, 3) multistep retrosynthesis prediction and 4) synthesis action generation. We first train a multimodal ligand–protein binding affinity model on predicting affinities of bioactive compounds to target proteins and couple this model with pharmacological toxicity predictors. Employing these models as reward function for a conditional molecular generator, we construct a generative model that can propose binding ligands for unseen protein targets. In silico, the generated molecules exhibit favorable properties in terms of target binding affinity, selectivity and drug-likeness. Next, we automatically infer synthesis routes using multistep molecular retrosynthetic models for the 250 best generated molecules. Last, the reaction sequences are automatically converted to stepwise experimental procedures for chemical synthesis. From the discovery of these targeted molecules to the derivation of the synthesis procedure this approach does not require intervention by domain experts. The approach is confirmed by the successful synthesis of one potential ligand for the human ACE2 receptor. Code as well as pretrained models for all components 1) to 4) above are publicly availble.