About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ACS Fall 2023
Talk
Molecular dynamics as a data source: scaling simulation for building AI models
Abstract
Molecular dynamics simulation is well-established as a technique contributing to drug and materials discovery. Increasingly important is its use as a data source for training AI models. Scaling the scope and size of such data sets will be key to building foundation models based on large-scale and diverse information. We use an IBM-developed open-source toolkit, Simulation Toolkit for Scientific Discovery (ST4SD), to automate simulation workflows. These workflows can be readily scaled to take full advantage of traditional high-performance computing and emerging OpenShift clusters. We then show how large-scale simulation data can be digested by graph-based, deep neural networks that our team has designed. We build a model for antigen-peptide immunogenic prediction that outperforms hand-engineered features trained on the same dataset and is further shown to outperform state-of-the-art sequence-based models in the low-data regime.