Imagine solving a puzzle with a billion pieces. That’s pretty much what immunologists deal with when trying to predict the specificity of T cell receptors.
T cells are a vital component of our immune system—the natural defense mechanism aimed at helping us avoid getting sick. To spot potential threats like viruses or bacteria, T cells have a special protein on their surface: the T cell receptor.
It can bind to a small part of a pathogen, called the epitope. Like puzzle pieces, the receptors are similar, but the parts that bind to the epitope are slightly different. That gives each receptor a unique specificity—the ability to bind only some pathogens but not others.
But how can we know which T cell receptors will bind to specific epitopes? That’s the big question of this gigantic immunology puzzle—and we’ve decided to tackle it with deep learning. In our recently published Bionformatics paper “TITAN: T Cell Receptor Specificity Prediction with Bimodal Neural Networks,”1 we describe how our AI determines the likelihood of a specific T cell receptor binding to a specific epitope.
We show that our AI outperforms the state-of-the art method and provides biologically relevant explanations for its decisions. Our model could lead to using T cells as biomarkers to spot specific infections or cancers early and help make immunotherapies safer.
Limited data for training AI
Solving such a puzzle with machine learning is not trivial—there’s simply not enough data on receptors binding to epitopes to train the AI. We have only very few examples of puzzle pieces fitting together.
We decided to try a two-step approach, drawing inspiration from previous work2 on predicting drug efficacy. We realized that predicting T cell receptor specificity is somewhat similar. The efficacy of a small molecule acting as a drug depends on its ability to bind to a large protein in the targeted diseased tissue.
For T cell receptors the situation is in reverse: the large protein receptor needs to bind to the small molecule present in the diseased tissue.
This insight allowed us to use a trick called transfer learning. We first let the model learn general concepts of chemical interactions from large datasets of protein-drug binding. We then fine-tuned the model by training it more specifically on T cell receptor-epitope interactions. This way, we could greatly boost the performance of the model, which we called TITAN, short for Tcr epITope bimodal Attention Networks.
For the T cell receptors that TITAN has never seen during training, it was able to predict the likelihood of binding to a specific epitope with 79% accuracy. This is significantly better than the current state-of-the-art model, ImRex.3
Unlike a lot of machine learning models, TITAN is not a complete “black box.” A built-in attention mechanism acts like a window letting us peek inside to see which parts of the T cell receptor and epitope sequences the model pays most attention to. We showed that the attention mechanism could shift its focus for each new T cell receptor-epitope pair, just as we expected.
We also noticed that the atoms TITAN considered most relevant in the epitope were indeed the ones most likely to be involved in the chemical interaction with the T cell receptor.
Possible future biomarkers
Getting this far is great. But we are not done yet. TITAN can’t predict whether a T cell receptor will bind to an epitope the AI hasn’t seen during training. This is disappointing, but hardly surprising.
After all, databases only contain information on a few hundred different epitopes—while there are as many possible epitopes as there are stars in our galaxy. It’s a huge challenge for any model to make predictions for all of them. Neither our model nor any other published model can do that. But we think that our approach of learning from larger, related datasets is a promising first step to overcome this issue.
With TITAN, we have put the first puzzle pieces together. Our long-term goal is to build a reliable, general T cell receptor specificity prediction algorithm—one that could solve the whole puzzle. Such a model would open up possibilities to use T cells as biomarkers, indicating whether a patient has a certain infection, an autoimmune disease, or even cancer. At the same time, it could help researchers design T cells that specifically target cancer cells and make immunotherapies safer.
TITAN is just the first step towards a much larger goal, which could forever change the way we diagnose and treat diseases.
- Weber, A., Born, J., Rodriguez Martínez, M. TITAN: T-cell receptor specificity prediction with bimodal attention networks. Bioinformatics. Volume 37, Issue Supplement_1, July 2021, Pages i237–i244. (2021).↩
- Born, J., et al. Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2. Mach. Learn.: Sci. Technol. 2 025024. (2021).↩
- Moris, P., De Pauw, J., Postovskaya, A., et al. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Briefings in Bioinformatics. bbaa318. ( 2020).↩