Article
High-Dimensional Software Engineering Data and Feature Selection
2009 21st IEEE International Conference on Tools with Artificial Intelligence
(2009)
Abstract
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) and our proposed hybrid feature selection (HFS) technique. Our case study consists of a very highdimensional (42 software attributes) software measurement data set obtained from a large telecommunications system. The empirical analysis indicates that HFS performs better than FRT; however, the Kolmogorov-Smirnov feature ranking technique demonstrates competitive performance. For the telecommunications system, it is found that only 10% of the software attributes are sufficient for effective software quality prediction.
Keywords
- software metrics,
- quality prediction,
- feature ranking,
- hybrid feature selection,
- high-dimensional data
Disciplines
Publication Date
November, 2009
Citation Information
Huanjing Wang, Taghi M. Khoshgoftaar and kehan Gao. "High-Dimensional Software Engineering Data and Feature Selection" 2009 21st IEEE International Conference on Tools with Artificial Intelligence (2009) Available at: http://works.bepress.com/huanjing_wang/2/