About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
EuroS&P 2023
Conference paper
Code Vulnerability Detection via Signal-Aware Learning
Abstract
Machine Learning-based modeling of source code understanding tasks has been gaining popularity. Accompanying their rapid proliferation is an emerging scrutiny over the models' reliability. Concerns have been raised regarding the models not actually learning task-relevant source code features, but fitting other correlated data. To improve model trustworthiness, in this work, we explore data-driven approaches for enhancing model signal awareness, i.e., learning the relevant signals in the input for making predictions. We do so by incorporating the notion of code complexity during model training, both (i) explicitly via curriculum learning, and (ii) implicitly by augmenting the training dataset with simplified signal-preserving programs. With our techniques, we achieve up to 4.8x improvement in signal awareness of vulnerability detection models. Using the notion of code complexity, we present a novel interpretation of the model learning behaviour from the perspective of the dataset. We use it to introspect model learning difficulties, and analyze the learning enhancements achieved with our approaches.