- Jannis Born
- Yoel Shoshan
- et al.
- 2022
- J. Chem. Inf. Model.
Biomedical Foundation Models
Overview
Learning a molecular language for protein interactions is crucial for advancing drug discovery. Foundation models, trained on diverse biomedical data like antibody-antigen interactions and small molecule-protein interactions, are transforming this field. Unlike traditional computational approaches, they widen the search scope for novel molecules and refine it to eliminate unsuitable ones, emphasizing the detailed nuances in molecular structure and dynamics.
IBM Research biomedical foundation model (BMFM) technologies leverage multi-modal data of different types, including drug-like small molecules and proteins (covering a total of more than a billion molecules), as well as single-cell RNA sequence and other biomedical data.
Our research team has a diverse range of expertise, including computational chemistry, medicinal chemistry, artificial intelligence, computational biology, physical sciences, and biomedical informatics.
Our BMFM Technologies currently cover the following three domains:
Foundation models for Targets Discovery
Targets discovery models learn the representation of DNA, bulk RNA, single-cell RNA expression data and other cell level signaling information for the identification of novel diagnostic and therapeutic targets, allowing tasks such as cell type annotation and classification, gene perturbation prediction, disease state prediction, splice variants prediction, promoter region, and treatment response.
Foundation Models for Biologics Discovery
Biologics discovery models focus on biologic therapeutics discovery, with the goal of leveraging large-scale representations of protein sequences, structures, and dynamics for diverse downstream tasks associated with multiple biologics modalities. These models produce unified representations of biological molecular entities, integrating data such as protein sequences, protein complex structures, and protein-protein complex binding free energies into a single framework. These models can serve as the basis for diverse downstream tasks in therapeutic design, including candidate generation and assessment, across antibody, TCR, vaccine, and other modalities.
Foundation Models for Small Molecules Discovery
Small molecules models can address a wide variety of downstream predictive and generative tasks. These models are trained on multiple representations of small molecules data to learn rich low-dimensional representations of biochemical entities relevant to drug discovery, allowing tasks such as property and affinity prediction, multi-model late fusion prediction, and scaffold-based generation. Predictive models are transformer models pretrained on multiple views (i.e., modalities) of small molecule data and learn rich latent representations by maximizing mutual information across different views of molecules. Generative models learn by driving input molecules to output mutant molecules with a cognate property embedding of the mutant via diffusive denoising networks. Given a set of desired properties and a desired template molecule (3D-strcutures), a set of designer molecules (3D-strcutures) can be obtained.
Related Links and Activities
Open Source Models and Tools
Our team have been developing a broad suite of models and architectures and we have open-sourced two models (model weights available in Hugging Face and code in Git):
biomed.sm.mv-te-84m is a multi-modal, multi-view model trained on small molecules data.
biomed.omics.bl.sm.ma-ted-458m is a multi-aligned sequence-based multi-domain model trained on biologics, small molecules, and scRNA-seq data.
Biomedical AI Tools and Methods are also available online.
- BiomedSciAI git repository
AI Alliance working group: AI for Drug Discovery
Cleveland Clinic and IBM Discovery Accelerator
Scientific Conferences
- International Conference on Intelligent Systems for Molecular Biology ISMB 2024
- ACS Fall 2024
For scientific journal publications please see our Publications section below.
Open Source Tools
Publications
- 2020
- J. Chem. Inf. Model.
- Seung-Gu Kang
- Joseph A. Morrone
- et al.
- 2022
- J. Chem. Inf. Model.
- Jerret Ross
- Brian Belgodere
- et al.
- 2022
- Nature Machine Intelligence
- Diego Chowell
- Luc G. T. Morris
- et al.
- 2018
- Science
- Nana Luo
- Jeffrey K. Weber
- et al.
- 2017
- Nature Communications
- David R. Bell
- Jeffrey K. Weber
- et al.
- 2020
- PNAS
- Kyle Daniels
- Shangying Wang
- et al.
- 2022
- Science