Skip to main content
Article
Collective personalized change classification with multiobjective search
IEEE Transactions on Reliability
  • Xin XIA
  • David LO, Singapore Management University
  • Xinyu WANG
  • Xiaohu YANG
Publication Type
Journal Article
Version
acceptedVersion
Publication Date
12-2016
Abstract

Many change classification techniques have been proposed to identify defect-prone changes. These techniques consider all developers' historical change data to build a global prediction model. In practice, since developers have their own coding preferences and behavioral patterns, which causes different defect patterns, a separate change classification model for each developer can help to improve performance. Jiang, Tan, and Kim refer to this problem as personalized change classification, and they propose PCC+ to solve this problem. A software project has a number of developers; for a developer, building a prediction model not only based on his/her change data, but also on other relevant developers' change data can further improve the performance of change classification. In this paper, we propose a more accurate technique named collective personalized change classification (CPCC), which leverages a multiobjective genetic algorithm. For a project, CPCC first builds a personalized prediction model for each developer based on his/her historical data. Next, for each developer, CPCC combines these models by assigning different weights to these models with the purpose of maximizing two objective functions (i.e., F1-scores and cost effectiveness). To further improve the prediction accuracy, we propose CPCC+ by combining CPCC with PCC proposed by Jiang, Tan, and Kim To evaluate the benefits of CPCC+ and CPCC, we perform experiments on six large software projects from different communities: Eclipse JDT, Jackrabbit, Linux kernel, Lucene, PostgreSQL, and Xorg. The experiment results show that CPCC+ can discover up to 245 more bugs than PCC+ (468 versus 223 for PostgreSQL) if developers inspect the top 20% lines of code that are predicted buggy. In addition, CPCC+ can achieve F1-scores of 0.60-0.75, which are statistically significantly higher than those of PCC+ on all of the six projects.

Keywords
  • Cost effectiveness,
  • developer,
  • machine learning,
  • multiobjective genetic algorithm,
  • personalized change classification (PCC)
Identifier
10.1109/TR.2016.2588139
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Creative Commons License
Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International
Additional URL
https://doi.org/10.1109/TR.2016.2588139
Citation Information
Xin XIA, David LO, Xinyu WANG and Xiaohu YANG. "Collective personalized change classification with multiobjective search" IEEE Transactions on Reliability Vol. 65 Iss. 4 (2016) p. 1810 - 1829 ISSN: 0018-9529
Available at: http://works.bepress.com/david_lo/222/