Scientific discovery is one of the most important and challenging tasks humanity faces. It requires not only finding patterns and relations in data, but also explaining them with models that are consistent with existing knowledge and can make novel predictions. You need statistical inference and reasoning — alongside creativity and intuition. But scientists face many challenges with this process, including having to sift through massive amounts of (often complex), incomplete background knowledge on a topic, and difficulty generating and testing hypotheses at scale.
The scientific discovery practice has been historically divided into two approaches: first principles derivation, and data-driven inference. Traditionally, symbolic mathematical models were derived (often manually) in a first-principles manner using domain knowledge and logical inference steps, and then assessed against experimental data. In the past few years, we have witnessed the rise of statistical AI algorithms that can rapidly generate data-driven models, but rely upon large volumes of data being available. However, automatically obtaining models that are consistent with existing knowledge, as well as establishing new models with little data, remain open problems.
To address these challenges, we developed and built AI-Descartes, a framework for automated scientific discovery that leverages both data and knowledge to generate and evaluate candidate symbolic models. The framework was first unveiled in Nature Communications on April 12.
AI-Descartes is inspired by the work of René Descartes, one of the founders of modern science and philosophy. Descartes proposed a method of scientific inquiry based on four rules:
- Accept nothing as true that is not self-evident
- Divide each problem into as many parts as possible
- Proceed from the simplest to the most complex
- Review everything to avoid errors
AI-Descartes follows these rules by using rigorous logic and mathematics to discover and explain natural phenomena or processes.
AI-Descartes consists of four main components: First, there is a data pre-processing module that transforms raw data into a suitable format for discovery. Then there is a model generation module that uses machine learning to generate candidate models that are consistent with the data. Finally, there is a model evaluation module that combined with an automated theorem prover, ranks the candidate models based on various criteria, including logical derivability, other logic measures, simplicity, accuracy, novelty, and generality.
Some of the key challenges are related to the computational scalability of the knowledge-based reasoning component of the framework. Formal logic — as propositional, first-order, or high order logic — dramatically extends the ability of state-of-the art, data-driven approaches to incorporate knowledge, and offers models that are both consistent with background knowledge, but also economical when it comes to sample complexity. Moreover, while it is broadly accepted that data can be noisy and sparse, we often regard formal logic as flawless, which at times is not the case. Background theory can be partial (as in missing some necessary axioms to fully derive a relation), as well as not universally correct (Newton’s laws are known to be wrong, yet offer useful background knowledge primitives in many situations). We are currently looking at new ways to overcome this bottleneck and ensure that both knowledge and data exchange are valued on a similar footing, rather than having one qualifying for another.
The framework can be employed by both theoreticians and experimentalist scientists to discover yet-to-be understood natural or artificial phenomena. We have employed the system in service of problems presented to us by DARPA and the Air Force Research Laboratory (AFRL) and attained insightful models.
We have released an open-source suite enabling scientists to bring their own data and background theory and experiment with the implementation. We’re inviting researchers and practitioners to try out our framework and provide feedback. We believe that AI-Descartes is a promising step towards achieving the ultimate goal of understanding and explaining the world. We hope that AI-Descartes will inspire more research and development in the field of automated scientific discovery and contribute to the advancement of science and technology for the benefit of humanity.