- Daniel Probst
- Matteo Manica
- et al.
- 2022
- Nature Communications
AI for Scientific Discovery
Overview
Scientific discovery has always been core to human innovation and how we explore the world. Today, the convergence of computing technologies offers an unprecedented opportunity to accelerate discovery. We are conceiving and developing novel frameworks for AI foundation models and multi-cloud computing to usher in a new era of reproducible and collaborative experimentation for scientific discovery. Our focus is to leverage these technologies to empower researchers in all domains with tools to capture, process and learn from all information in laboratory operations. These research tools will be at the center of AI-enabled labs of the future, learn over time from successes and failures to provide recommendations, optimize experimentation, improve reproducibility and ultimately boost scientific productivity.
Foundation models for scientific discovery
Foundation models are at the heart of a significant shift in the construction of AI systems and the transformative impact they have in countless applications. Trained on vast amounts of unlabeled data at scale, we are exploring how such models can be adapted for scientific discovery based on multiple data modalities such as lab measurements, images, audio, and natural text as well as scientific languages. We anticipate foundation models to support essential activities in the lab, such as enabling unprecedented automatic documentation of procedures to capture lab knowledge, planning experiments and interpreting analytical data from instruments, leading to a new era of AI-enabled research and innovation. This technology promises transformation beyond the lab, offering new capabilities to every professional interacting with their environment through multiple modalities.
Multi-cloud computing for scientific discovery
Scientific discovery today involves data and compute across many different heterogeneous IT environments. Examples range between on-premise infrastructure, to multiple private and public clouds to store, transfer and process data. We are exploring and implementing approaches to holistically integrate all data and metadata from the entire digital environment of research workflows. Thereby, allowing research teams to have a common view of experiments, reconstruct scenarios from any point in time, and learn from the end-to-end execution and outcomes of all of their work.
In addition, we apply and extend multi-cloud computing for all the compute operations based on experimental data up to final insight of the experiment. This enables configuration and optimization of where each compute step of an experiment is executed.
IBM RXN for Chemistry
With IBM RXN for Chemistry, we have pioneered the first AI-enabled chemical synthesis planning tool available as a cloud service. Based on transformer models trained on millions of synthetic organic chemistry reactions, IBM RXN represents a new approach to do digital chemistry that leverages language models to predict chemical reactions, find retrosynthesis pathways, and convert experimental procedures to a list of actions for lab automation (RoboRXN).
Generative Toolkit for Scientific Discovery (GT4SD)
Our Generative Toolkit for Scientific Discovery (GT4SD) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. GT4SD provides a library that makes generative AI models easier to use in order to generate new hypotheses and for fine-tuning generative models for specific domains using custom data sets. The application space in science is vast, ranging from materials science to drug discovery, from the formulation of new compounds to the determination of testing conditions, and thereby unlocking the hypothesis generation step as a key component of the scientific method.
AI-Assisted Chemical Sensing (HyperTaste)
Combining AI with analytical systems combining multiple sensors can significantly facilitate and accelerate the chemical analysis of complex materials. With HyperTaste, we have demonstrated an AI-assisted electronic tongue that leverages supervised and unsupervised learning in automated and portable testing systems for the analysis of complex liquids. The technology is used to accelerate experimental validation of hypotheses and explore new chemical spaces across a variety of domains and use cases.
Technical resources
Publications
- Alessandra Toniato
- Philippe Schwaller
- et al.
- 2021
- Nature Machine Intelligence
- Jannis Born
- Matteo Manica
- et al.
- 2021
- Machine Learning: Science and Tech.
- Gianmarco Gabrieli
- Michal Muszynski
- et al.
- 2022
- IFSET