Skip to main content

10x faster design

Generative Modeling: Expanding creativity in molecular design.

*Initial material concept ideation demonstrated in partnership with Nagase.

You may have heard of AI engines that can draw realistic images of landscapes or portraits of people that don't exist. These are called generative models, and rather than use them to create imaginary things, we’ve adapted this technology to design new materials at unprecedented speeds.

See how we used Generative Modeling to design a new molecule

How it works

Scientists define the challenge. Generative models create every possible solution to pick from.

Materials design is a complex combinatorial problem. Even small molecules made of only a few atoms have hundreds of possible combinations, making the entire space of materials almost inconceivably vast.

AI, however, is especially suited to handle huge sets of possibilities and permutations. We use data from a Deep Search database (augmented with AI-simulated data) to train our generative model. From analyzing this data, the model is able to comprehend the relationship between molecules and their properties—it understands the “rules” of molecule design.

Next, researchers input desired properties—performance, safety, sustainability, etc.—and in what target ranges. Researchers can also add design constraints like radioactivity or chemical stability thresholds.

The generative model then sets to work, calculating and reconciling thousands of atomic configurations to produce an exhaustive set of designs that satisfy the parameters. It’s now up to the researchers to curate the results. Using a combination of human expertise and AI, they can whittle the outputs down to the best candidates before even setting foot in a lab.

Figure G1.


Analyzing existing materials

AI analyzes a dataset of known molecules to understand the relationship between molecular formations and their resulting properties.

Graphic that shows molecular relationships


Unlabelled illustrative bar charts for LD50, Lamda Max and Biodegradability

Exploring variations

Once researchers set the output parameters (target values), the generative model uses the thousands of molecular “formulas” it’s derived from the data to generate an exhaustive set of novel combinations.

Figure G2.


Unique molecules designed

The generative model has effectively turned existing molecules into “guidelines” to create a batch of novel molecules with the same attributes.

Graphic that shows a generative model of molecules

Designing a new material can now happen in a matter of hours. Generative models account for the exponential number of permutations and calculations required to draft a molecule —a process that once

Models for
Dr. Seiji Takeda Ph.D.
Generative Modeling at work

Project Photoresist

Producing over 2,000+ molecular designs in under eight hours.

With Deep Search, we’d created a library of 5,000+ known PAGs, augmenting it with simulated data to determine each of their properties. From this library, we had already identified one existing, greener PAG alternative. But to expand our search, we needed to consider PAGs that did not yet exist. Rather than painstakingly design each hypothetical PAG, we trained a generative model with our 5000+ PAG dataset, specifying target values across five properties of interest: light absorption, biodegradability, and water solubility.

The generative model then used an inverse design algorithm to automatically produce whole families of novel PAG candidates in our target value range. Of the now 7,000+ PAG candidates, our team selected and vetted two highly promising candidates that were ready to be synthesized.

Figure G3.

PAG library

Deep Search and AI-enhanced Simulation gave us a comprehensive view of 5000+ known PAGs, but our search was still limited to existing molecules.

Graphic with the labels “Extracted PAG families”, “Lambda Max”, “Biodegradaibility”, and “LD50”

Figure G4.

Graphic with the labels “Generated PAGs”, “Lambda Max”, “Biodegradaibility”, and “LD50”

Expanding the pool

Our generative model used existing data to produce new options. It expanded our candidate pool with 2,000+ novel PAGs “pre-approved” to meet our sustainability design constraints.

Case Studies

Case study

A model trained on our partner's proprietary experimental data outperformed human chemists in speed and diversity of molecular design for sugars.

Case study

Novel azo dyes were generated with more variety in structural size and complexity than the original dataset.

We applied our novel AI generative frameworks to three SARS-CoV-2 targets and generated 3000 novel drug candidates.


Seiji Takeda, Toshiyuki Hama, Hsiang-Han Hsu, Victoria A. Piunova, Dmitry Zubarev, Daniel P. Sanders, Jed W. Pitera, Makoto Kogoh, Takumi Hongo, Yenwei Cheng, Wolf Bocanett, Hideaki Nakashika, Akihiro Fujita, Yuta Tsuchiya, Katsuhiko Hino, Kentaro Yano, Shuichi Hirose, Hiroki Toda, Yasumitsu Orii, and Daiju Nakano

KDD (2020)

Vijil Chenthamarakshan, Payel Das, Samuel C. Hoffman, Hendrik Strobelt, Inkit Padhi, Kar Wai Lim, Benjamin Hoover, Matteo Manica, Jannis Born, Teodoro Laino, and Aleksandra Mojsilovic

NeurIPS (2020)

Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, and Aleksandra Mojsilovic

ArXiv (2020)

Kar Wai Lim, Bhanushee Sharma, Payel Das, Vijil Chenthamarakshan, and Jonathan S. Dordick

ArXiv (2020)

Yair Schiff, Vijil Chenthamarakshan, Karthikeyan Natesan Ramamurthy, and Payel Das

ArXiv (2020)

Discovery Workloads
on the Hybrid Cloud

Emerging discovery workflows are posing new challenges for compute, network, storage, and usability. IBM Research supports these new workflows by bringing together world-class physical infrastructure, a hybrid cloud platform that unifies computing, data, and the user experience, and full-stack intelligence for orchestrating discovery workflows across computing environments.