Skip to main content
Article
High-Dimensional Software Engineering Data and Feature Selection
2009 21st IEEE International Conference on Tools with Artificial Intelligence (2009)
  • Huanjing Wang, Western Kentucky University
  • Taghi M. Khoshgoftaar, Florida Atlantic University
  • kehan Gao, Eastern Connecticut State University
Abstract
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very highdimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction.
Keywords
  • software metrics,
  • quality prediction,
  • feature ranking,
  • hybrid feature selection,
  • high-dimensional data
Publication Date
November, 2009
Citation Information
Huanjing Wang, Taghi M. Khoshgoftaar and kehan Gao. "High-Dimensional Software Engineering Data and Feature Selection" 2009 21st IEEE International Conference on Tools with Artificial Intelligence (2009)
Available at: http://works.bepress.com/huanjing_wang/2/