Comparing active site sequence representations for kinase-ligand affinity prediction

Jannis Born; Tien Huynh; Astrid Stroobants; Wendy Cornell; Eric Martin; Matteo Manica

ACS Fall 2022

Invited talk

21 Aug 2022

Comparing active site sequence representations for kinase-ligand affinity prediction

Abstract

Comparing active site sequence representations for kinase-ligand affinity prediction

Jannis Born, Tien Huynh, Astrid Stroobants, Wendy Cornell, Eric Martin, Matteo Manica

We have previously reported extension of the PaccMann string-based model to proteochemometric activity classification and molecule generation of kinase inhibitors, trained as a single model based on hundreds of kinase family members with individual kinases represented by their active site residues rather than full sequence. Here we compare impact of specific choice of active site residues, exploring active site definitions from Sheridan et al. (29 residues) and Martin et al.(16 residues). For predicting activity of unseen ligands, the Martin representation outperformed the Sheridan one, and the representation combining residues from Sheridan and Martin performed best of all. For predicting activity of unseen kinases, none of the three representations was superior. These latest results support our earlier findings that superior performance in activity prediction can be achieved by representing the target with a subset of key residues rather than the full sequence, additionally offering improvements in speed.

Paper