Skip to main content
Article
Deep code comment generation with hybrid lexical and syntactical information
Empirical Software Engineering
  • Xing HU, Peking University
  • Ge LI, Peking University
  • Xin XIA, Monash University
  • David LO, Singapore Management University
  • Zhi JIN, Peking University
Publication Type
Journal Article
Version
acceptedVersion
Publication Date
1-2019
Abstract

During software maintenance, developers spend a lot of time understanding the source code. Existing studies show that code comments help developers comprehend programs and reduce additional time spent on reading and navigating source code. Unfortunately, these comments are often mismatched, missing or outdated in software projects. Developers have to infer the functionality from the source code. This paper proposes a new approach named Hybrid-DeepCom to automatically generate code comments for the functional units of Java language, namely, Java methods. The generated comments aim to help developers understand the functionality of Java methods. Hybrid-DeepCom applies Natural Language Processing (NLP) techniques to learn from a large code corpus and generates comments from learned features. It formulates the comment generation task as the machine translation problem. Hybrid-DeepCom exploits a deep neural network that combines the lexical and structure information of Java methods for better comments generation. We conduct experiments on a large-scale Java corpus built from 9,714 open source projects on GitHub. We evaluate the experimental results on both machine translation metrics and information retrieval metrics. Experimental results demonstrate that our method Hybrid-DeepCom outperforms the state-of-the-art by a substantial margin. In addition, we evaluate the influence of out-of-vocabulary tokens on comment generation. The results show that reducing the out-of-vocabulary tokens improves the accuracy effectively.

Keywords
  • Comment generation,
  • Deep learning,
  • Program comprehension
Identifier
10.1007/s10664-019-09730-9
Publisher
Springer Verlag (Germany)
Copyright Owner and License
Authors
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Additional URL
https://doi.org/10.1007/s10664-019-09730-9
Citation Information
Xing HU, Ge LI, Xin XIA, David LO, et al.. "Deep code comment generation with hybrid lexical and syntactical information" Empirical Software Engineering Vol. 25 Iss. 3 (2019) p. 2179 - 2217 ISSN: 1382-3256
Available at: http://works.bepress.com/david_lo/253/