About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
NeurIPS 2021
Workshop paper
Leveraging Adversarial Reprogramming for Novel Structure-constrained Protein Sequence Design
Abstract
Designing novel and diverse protein sequences consistent with a given structure is an important task towards scientific discovery. Recently, deep language models that learn from large unlabeled corpus have shown impressive success in protein sequence generation. Since only a small fraction of the entire sequence corpus has structural annotation available, training a model to generate structure-constrained sequences from scratch can lead to degraded performance. The method of Adversarial Reprogramming (AR) focuses on repurposing pre-trained machine learning models for target domain tasks with scarce data, where it may be difficult to train a high-performing model from scratch. Prior works in AR have primarily focused on classification-based tasks. In our work, we seek to extend the capabilities of reprogramming beyond classification tasks, and towards a more complex problem of sequence generation tasks in the molecular space. Specifically, we repurpose pre-trained language models used for text-infilling, to infill protein sequence templates as a method of novel protein generation. In doing so, we demonstrate that via AR, sequence generation for low resourced data is both achievable while still upholding the structural integrity of the sequences.