Cost-Sensitive Bayesian Active Molecular Design with Auxiliary Properties
Given the costs of evaluating molecular properties, and the vast size of molecular search space, there is a need for data efficient algorithms which actively and strategically select candidate molecules for evaluation. While target properties are costly to compute or measure, in many cases auxiliary physical/chemical properties can be queried at a lower cost, and are known to be predictive of (and mechanistically related to) target properties. We introduce a Bayesian active learning algorithm which (i) maintains a graphical Gaussian Process based model of the dependencies between molecular structure, auxiliary physical/chemical properties, and target properties such as toxicity and biodegradability, and (ii) adaptively selects properties to evaluate depending on evaluation costs, model uncertainty, and expected task-relevant information gain. We discuss its ability to identify molecules with target property values in a cost-effective manner on a class of anionic photoacid generator (PAG) molecules, and study the dependence of learned molecule evaluation strategies on the relative query costs and mutual information between target and auxiliary properties.