Accelerating Material Design with the Generative Toolkit for Scientific Discovery (GT4SD)

Matteo Manica; Joris Cadow; Dimitrios Christofidellis; Ashish Dave; Jannis Born; Dean Clarke; Yves Gaetan Nana Teukam; Samuel Hoffman; Matthew Buchan; Vijil Vijil; Timothy Donovan; Hsianghan Hsu; Federico Zipoli; Oliver Schilter; Akihiro Kishimoto; Lisa Hamada; Inkit Padhi; Karl Wehden; Lauren McHugh; Alexy Khrabrov; Payel Das; Seiji Takeda; John Smith

ACS Fall 2022

Conference paper

21 Aug 2022

Accelerating Material Design with the Generative Toolkit for Scientific Discovery (GT4SD)

View code

Abstract

The GT4SD (https://github.com/GT4SD/gt4sd-core) is an open- source library to accelerate hypothesis generation in the scientific discovery process that eases the adoption of state-of-the-art generative AI. GT4SD includes models that can generate new molecule designs based on properties such as target proteins, target omics profiles, scaffolds distances, binding energies, and additional targets relevant for materials and drug discovery. The library provides an effective environment for the generation of new hypotheses (inference) and for fine-tuning the models to specific domains using custom data sets (models retraining). It is compatible with the majority of popular deep learning frameworks: PyTorch, PyTorch Lightning, HuggingFace Transformers, GuacaMol, Moses, and serves a wide range of applications ranging from materials science to drug discovery. GT4SD's common framework makes models easily accessible to a broader community, like AI/ML practitioners developing new generative models who want to deploy with just a few lines of code. GT4SD provides a centralized environment for scientists and students interested in using generative models in their scientific research, allowing them to access and explore a variety of different models — all of which are pretrained. Consistent commands and interfaces for inference or retraining with customizable parameters harmonize the use across the different models. The development of problem-specific intelligence is made possible thanks to the automatic workflows enabling retraining with users' own data covering molecular structures and properties. The replacement of manual processes and human bias in the discovery process has important effects on downstream applications that rely on the use of AI models, leading to an acceleration of expert knowledge. In this talk, we will present GT4SD code base and its main functionalities, ranging from model inference to training in the context of material science and chemistry. The session is designed to provide a deep dive for developers and research scientists who want to accelerate their discovery pipelines with generative modeling capabilities.

Conference paper