Skip to main content
Article
Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
ASE 2012: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 3-7 September, Essen, Germany
  • Anh Tuan NGUYEN, Iowa State University
  • Tung NGUYEN, Iowa State University
  • Tien NGUYEN, Iowa State University
  • David LO, Singapore Management University
  • Chengnian SUN, National University of Singapore
Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
9-2012
Abstract

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.

Keywords
  • Duplicate Bug Reports,
  • Topic Model,
  • Information Retrieval
ISBN
9781450312042
Identifier
10.1145/2351676.2351687
Publisher
ACM
City or Country
New York
Copyright Owner and License
Publisher
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Comments

Won ACM SIGSOFT Distinguished Paper Award

Additional URL
https://doi.org/10.1145/2351676.2351687
Citation Information
Anh Tuan NGUYEN, Tung NGUYEN, Tien NGUYEN, David LO, et al.. "Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling" ASE 2012: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, 3-7 September, Essen, Germany (2012) p. 70 - 79
Available at: http://works.bepress.com/david_lo/243/