J. Chem. Inf. Model.
On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction
Recent work showed that active site rather than full-protein-sequence information improves predictive performance in kinase-ligand binding affinity prediction. To refine the notion of an "active site", we here propose and compare multiple definitions. We report significant evidence that our novel definition is superior to previous definitions and better models of ATP-noncompetitive inhibitors. Moreover, we leverage the discontiguity of the active site sequence to motivate novel protein-sequence augmentation strategies and find that combining them further improves performance.