Improved code summarization via a graph neural network

Alexander LeClair; Sakib Haque; Lingfei Wu; Collin McMillan

doi:10.1145/3387904.3389268

ICPC 2020

Conference paper

13 Jul 2020

Improved code summarization via a graph neural network

View publication

Abstract

Automatic source code summarization is the task of generatingnatural language descriptions for source code. Automatic code summarization is a rapidly expanding research area, especially as thecommunity has taken greater advantage of advances in neural network and AI technologies. In general, source code summarizationtechniques use the source code as input and outputs a natural language description. Yet a strong consensus is developing that usingstructural information as input leads to improved performance. Thefirst approaches to use structural information flattened the AST intoa sequence. Recently, more complex approaches based on randomAST paths or graph neural networks have improved on the modelsusing flattened ASTs. However, the literature still does not describethe using a graph neural network together with source code sequence as separate inputs to a model. Therefore, in this paper, wepresent an approach that uses a graph-based neural architecturethat better matches the default structure of the AST to generatethese summaries. We evaluate our technique using a data set of2.1 million Java method-comment pairs and show improvementover four baseline techniques, two from the software engineeringliterature, and two from machine learning literature.

Conference paper