ICBG 2023

AutoPeptideML: An Automated Machine Learning Method for Building Peptide Bioactivity Predictors Leveraging Protein Language Models

Download paper


Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. Here, we consider the design of such a tool for developing peptide bioactivity predictors. We analyse different design choices concerning data acquisition and negative class definition, homology partitioning for the construction of independent evaluation sets, the use of protein language models as a general sequence featurization method, and model selection and hyperparameter optimisation. Finally, we integrate the conclusions drawn from this study into AutoPeptideML, an end-to-end, user-friendly application that enables experimental researchers to build their own custom models, facilitating compliance with community guidelines. Source code, documentation, and data can be found in the project GitHub repository: