Enhancing Performance Bug Prediction Using Performance Code Metrics
Abstract
Performance bugs are non-functional defects that can significantly reduce the performance of an application (e.g., software hanging or freezing) and lead to poor user experience. Prior studies found that each type of performance bugs follows a unique code-based performance anti-pattern and proposed different approaches to detect such anti-patterns by analyzing the source code of a program. However, each approach can only recognize one performance anti-pattern. Different approaches need to be applied separately to identify different performance anti-patterns. To predict a large variety of performance bug types using a unified approach, we propose an approach that predicts performance bugs by leveraging various historical data (e.g., source code and code change history). We collect performance bugs from 80 popular Java projects. Next, we propose performance code metrics to capture the code characteristics of performance bugs. We build performance bug predictors using machine learning models, such as Random Forest, eXtreme Gradient Boosting, and Linear Regressions. We observe that: (1) Random Forest and eXtreme Gradient Boosting are the best algorithms for predicting performance bugs at a file level with a median of 0.84 AUC, 0.21 PR-AUC, and 0.38 MCC; (2) The proposed performance code metrics have the most significant impact on the performance of our models compared to code and process metrics. In particular, the median AUC, PR-AUC, and MCC of the studied machine learning models drop by 7.7%, 25.4%, and 20.2% without using the proposed performance code metrics; and (3) Our approach can predict additional performance bugs that are not covered by the anti-patterns proposed in the prior studies.CCS CONCEPTS•Software and its engineering Software testing and debugging; Software testing and debugging