Biomedical Foundation Models

Exploring the power of BMFM technologies to drive critical tasks in drug discovery


Learning a molecular language for protein interactions is crucial for advancing drug discovery. Foundation models, trained on diverse biomedical data like antibody-antigen interactions and small molecule-protein interactions, are transforming this field. Unlike traditional computational approaches, they widen the search scope for novel molecules and refine it to eliminate unsuitable ones, emphasizing the detailed nuances in molecular structure and dynamics.

IBM Research biomedical foundation model (BMFM) technologies leverage multi-modal data of different types, including drug-like small molecules and proteins (covering a total of more than a billion molecules), as well as single-cell RNA sequence and other biomedical data.

Our research team has a diverse range of expertise, including computational chemistry, medicinal chemistry, artificial intelligence, computational biology, physical sciences, and biomedical informatics.

Boehringer Ingelheim collaboration: searching for new antibody drugs

Designing large molecules like antibody proteins requires different approaches compared to small molecule drugs. Small-molecule drugs, with fewer than 200 atoms, offer billions of combinations, but antibodies are even more complex. An average antibody drug has up to 200,000 atoms, made from 20 amino acids, leading to a staggering number of possible structures.

Our collaboration with Boehringer Ingelheim focuses on using foundation models to streamline this complexity. These models utilize data across four modalities: billions of protein sequences as text strings; millions of 3D protein shapes, determined through lab experiments and AI like AlphaFold; thousands of antibody-antigen complex structures from experiments and simulations; and data on the binding affinity of numerous antibody-antigen pairs, also from experiments and simulations. Using BMFM technologies, we pre-train models on these data and then fine-tune them for specific antibody drug discovery targets.

In practice, we generate new antibody candidates by inputting a target sequence, such as an RSV viral capsid protein, into the model. This produces millions of antibody sequences designed to bind that target. We then assess these candidates based on criteria like predicted compatibility with the human body, binding strength to the target, and more. Users can set their preferred thresholds and make various trade-off decisions in this process.

Foundation models for biologics discovery

IBM is leveraging BMFM technologies to train models in biologics discovery, focusing on simulated protein interactions. This training helps the model understand how antigens move and interact with antibodies. The key to an antibody binding to its target lies in its flexible loops, and a model informed about protein dynamics, such as induced fit and conformational changes, is better equipped to predict antibody-target affinity.

The model's ability to grasp fluctuations in molecular structure is crucial for determining the likelihood of an antibody, or a similar molecule like a T cell receptor, binding to its target. In a recent study published in Briefings in Bioinformatics, researchers from IBM and Cleveland Clinic demonstrated the potential to use these insights for cancer immunotherapy discovery. They found that the presentation landscapes of cancer neoantigen peptides are diverse and dynamic. Unsupervised AI models revealed rare conformations significant for targeting by engineered T cells. Following this, supervised AI models trained on molecular dynamics simulations identified specific peptide structures that are more likely to bind to T cells, guiding the design of rational cancer immunotherapies. IBM is incorporating these dynamic features into its BMFM capabilities to enhance multimodal biologics discovery.

BMFM_Technologies_for_Drug_Discovery (1).png

Future work

In addition to incorporating sequence and structural constraints for generating small molecules and biologics that bind to known targets, future work on BMFM will integrate constraints based on other molecular properties. These include interactions with specific binding pockets or abstract structural features like epitopes in targets.

A key aspect of drug discovery, like lead optimization, involves exploring variations of a seed molecule or a particular molecular scaffold, searching for new molecules related to these starting points either in chemical structure or amino acid sequence. As we add more constraints, the challenge of creating molecules that meet our criteria increases. Therefore, we're also focusing on more efficient ways to explore and exploit promising areas in the landscape of new molecules.

Drug development goes beyond designing molecules that bind to a target. Our BMFM's additional modalities, such as genomics and transcriptomics, widen the search space to identify cellular processes involved in diseases, target relevant cell types, and address numerous other pertinent questions. Thus, BMFM can enhance drug development across all stages, from understanding disease mechanisms to developing small molecule or biological drugs to address these issues.