4 minute read

IBM’s new open-source toolkit for simulation

Using simulations and deep learning surrogates to test new ideas, we can fast-track the discovery of new materials, drugs, and more.


Using simulations and deep learning surrogates to test new ideas, we can fast-track the discovery of new materials, drugs, and more.

Models and simulations are a cornerstone of scientific discovery. When theories fall short and real-world experiments are impossible or impractical to run, mimicking a complex system in a sandbox can deliver fresh insights.

Models and simulations have allowed researchers to screen new drugs before testing them on people, explore the origins of our universe without traveling through time, and predict how longer red lights or wider streets will impact drivers before introducing the changes at rush hour. But running virtual experiments on molecules, stars, or moving traffic on the computer takes time and tons of computation.

We want to chart a new path. That’s why we’re excited to introduce ST4SD, our new open-source simulation toolkit for scientific discovery. It’s the latest arrow in the IBM Research quiver for accelerating science with the help of machine learning. The toolkit includes a set of evolving techniques for running physics-based simulations more efficiently. But the toolkit’s greatest potential lies in its ability to complement or replace computationally intensive simulations with lightweight, AI-driven surrogate models. Testing ideas on a surrogate rather than an expensive simulation could radically alter the way we do science.

Surrogates are deep-learning models that take data from a previous simulation to predict how altering a few variables will influence the outcome of the experiment. Surrogates allow you to test one idea after another at a fraction of the cost. A trained deep-learning surrogate may not be as precise as a physical simulation, but the results are good enough to rapidly validate or reject a hypothesis.

ST4SD is unique in providing a single environment for scientists to run simulations and surrogates alike, seamlessly integrating these two modes of experimentation. Our discovery pipeline lets you move around simulation or surrogate tasks like Lego bricks, recycling completed tasks whenever possible. Through a technique called memoization, the pipeline automatically identifies duplicate tasks in real time and swaps them out for the already executed version, boosting productivity.

These time-savings add up: We’ve shown that eliminating duplicate workloads can turbocharge the development process. The more tasks on your to-do list, the more hours you can potentially shave off your experiments. The fastest calculations, after all, are the ones you don’t have to run.

Accelerating the search for sustainable, climate-friendly materials

This new toolkit for simulations complements our previously released Deep Search toolkit for converting unstructured documents into machine-readable form and our Generative Toolkit for creating entirely new molecules, chemicals, and materials with a desired, highly useful property.

With these combined toolkits, IBM Research designed a more sustainable photoacid generator (PAG), used in printing semiconductors. Previously, the hunt for a candidate material would have taken years of trial-and-error experimentation. Instead, we hit our goal in weeks. We expect more success stories like this as researchers embrace open-source repositories, including this one for simulations, to share their work and build on each other’s discoveries.

Our colleagues at IBM recently used the ST4SD pipeline to hunt for new carbon capture and storage materials, which are urgently needed to stabilize and lower carbon-dioxide levels in the atmosphere to address climate change. An immediate carbon capture goal is to find an easier way to trap industrial carbon emissions at their source, before they leave the smokestack and diffuse into the wider atmosphere and become more expensive to catch.

A team led by IBM’s Mathias Steiner and Rodrigo Neumann used the ST4SD pipeline to comb through a database of more than 1 million highly porous materials with an affinity for carbon dioxide. They considered more than 100,000 materials before paring down the list to the 1,000 top performers, after factoring in things like heat, pressure, and water vapor content inside a real-world flue.

Running 100,000 simulations was no minor undertaking. The simulations ate up 10 years of CPU time spread over nine months. As intensive as the screening was, the ST4SD pipeline eased the burden by running the workloads in parallel, switching between on-premises supercomputers and computing clusters in the cloud. ST4SD’s memoization feature produced further time-savings by allowing the team to recycle previously run tasks. In all, the team estimates that the toolkit speeded up their evaluation of each material by about a third.

The researchers are now in the process of screening their 1,000 most promising candidates. Some of their unanswered questions include: Does changing the material’s atomic configuration improve its capacity to capture carbon? Is there an optimal pore size for maximizing the uptake of carbon over other gases released during fossil-fuel combustion? They plan to next run surrogate experiments on ST4SD to further narrow the field.

A second sustainability project has given us a glimpse of the tremendous power of deep surrogates. We recently used them to see if we could find a more efficient solar-panel semiconductor than the silicon used in photovoltaic cells today. How well a molecule converts sunlight to power depends on its shape and something called its frontier energy levels. Normally we would run quantum-chemical simulations to screen molecules for those with electronic structures predicted to efficiently convert solar radiation into electricity. But this time, we used surrogates to replace all but the last step. Instead of taking hours to evaluate one molecule, our experiment took seconds. Had we used simulations alone, the screening process would have taken 35 times longer.

We are excited to see where these experiments lead. We are also encouraged by the early results of those who have used our simulation toolkit. We plan to continue adding new features and virtual experiments to the ST4SD open-source library, or registries. We encourage you to check out the pipeline, contribute your work, and join a community of researchers using simulation to tackle some of society’s pressing challenges.