Publication
NeurIPS 2022
Workshop paper

Toward Human-AI Co-creation to Accelerate Material Discovery

Download paper

Abstract

There is an increasing need in our society to achieve faster advances in Science to tackle urgent problems, such as climate changes, environmental hazards, sustainable water management, sustainable energy systems, pandemics, among others. The urgency of scientific discovery in chemistry carries the extra burden of assessing risks of the proposed novel solutions before moving to the experimental stage. Despite several recent advances in Machine Learning and AI to address some of these challenges, there is still a gap in technologies to support end-to-end discovery applications, integrating the myriad of available technologies into a coherent, orchestrated, yet flexible discovery process. Such applications need to handle complex knowledge management at scale, enabling knowledge consumption and production in a timely and productive way for subject matter experts (SMEs). Furthermore, the discovery of novel functional materials strongly relies on the development of exploration strategies in the chemical space. For instance, gener- ative models have gained attention within the scientific community due to their ability to generate enormous volumes of novel molecules across material domains. These models exhibit extreme creativity that often translates in low viability of the generated candidates. In the context of materials discovery, viability is a complex metric evaluated by SMEs from complementary domains, such as synthetic organic chemistry, process scale-up, intellectual property development, regulatory compli- ance, and such. In this scenario, we observe an excellent opportunity to incorporate AI techniques to support SMEs, as well as the need for a platform to exploit the human-AI interaction focusing on reducing the time until the first discovery and the opportunity costs involved. In this work, we propose a workbench framework for the human-AI Co-creation to accelerate material discovery, which has four main components: generative models, dataset triage, molecule adjudication, and risk assessment.