Neuro-symbolic reinforcement learning for polymer discovery
We present the first application of neuro-symbolic reinforcement learning (NS RL) in materials discovery domain. Deep RL requires excessively large volume of training data, and the learned policies lack explainability. As a result, practical application of deep RL in material discovery is problematic. We explore Neuro-Symbolic approaches to deep RL that combine the strengths of data-driven AI with the capabilities of human-like symbolic knowledge and reasoning. Neuro-Symbolic approaches are anticipated to enable co-creation of models/policy with subject matter experts (SMEs) by capturing new domain knowledge in symbolic form. We investigate Logical Neural Networks (LNNs) where each neuron has an explicit meaning as a part of a formula in a weighted real-valued logic. In addition, the model is differentiable, and learning helps in learning new facts and make the network resilient against contradicting facts. In the presented study we use Logical Optimal Actions (LOA), an NS RL framework based on LNN, to train RL agents to select experimental conditions for the synthesis of spin-on-glass (SOG) given target values of experimental outcomes. The SOG is based on tetraethyl orthosilicate as the precursor and co-precursors such as phenyltriethoxysilane. Experimental degrees of freedom include temperature, reaction time, precursor/co-precursor ratio, total co-/precursor concentration, water/-precursor ratio, and catalyst-/precursor ratio. We explicitly pursue training of generalizable agents that learn to navigate abstract space of experiments relevant to SOG synthesis to find reaction conditions that yield materials with desired properties. We introduce a data-augmentation strategy to meet data requirements of NS RL while maintaining affordable volume of experimental data – under 300 experimental data points. NS RL experiments show that the LOA in combination with logical action-aware features noticeably improves agent's performance in the search for the experiments targeting specific molecular weight and polydispersity index of the produced SOG. Furthermore, the agent learns to avoid experimental conditions that produce undesirable outcomes: for example, the agent avoids synthesis conditions leading to gelation of the reaction mixtureof cross-linked SOG. Finally, we validate and benchmark the proposed NS RL approach by running spin-on-glass synthesis in the lab following AI agent predictions.