Deeptriage: Exploring the effectiveness of deep learning for bug triaging
Abstract
For a given software bug report, identifying an appropriate developer who could potentially fix the bug is the primary task of bug triaging. Automatic bug triaging is formulated as a classification problem, which takes the bug title and description as the input, and maps it to one of the available developers. A major challenge in doing this is that the bug description usually contains a combination of unstructured text, code snippets, and stack traces making the input data highly noisy. The existing bag-of-words (BOW) models do not consider the semantic information in the unstructured text. In this research, we propose a novel bug report representation using a deep bidirectional recurrent neural network with attention (DBRNN-A) that learns the syntactic and semantic features from long word sequences in an unsupervised manner. Using attention enables the model to remember and attend to important parts of text in a bug report. For training the model, we use unfixed bug reports (which constitute about 70% of bugs in a typical open source bug tracking system) which were ignored in previous studies. Another major contribution of this work is the release of a public benchmark dataset of bug reports from three open source bug tracking systems: Google Chromium, Mozilla Core, and Mozilla Firefox. The dataset consists of 383,104 bug reports from Google Chromium, 314,388 bug reports from Mozilla Core, and 162,307 bug reports from Mozilla Firefox. When compared to other systems, we observe that DBRNN-A provides a higher rank-10 average accuracy.