Skip to main content
Article
A Wrapper-Based Feature Selection for Analysis of Large Data Sets
ECU Publications Pre. 2011
  • Jinsong Leng, Edith Cowan University
  • Craig Valli, Edith Cowan University
  • Leisa Armstrong, Edith Cowan University
Publication Date
1-1-2010
Document Type
Conference Proceeding
Publisher
IEEE
Faculty
Computing, Health and Science
School
School of Computer & Security Science
RAS ID
10246
Comments
This article was originally published as: Leng, J. , Valli, C. , & Armstrong, L. (2010). A Wrapper-based Feature Selection for Analysis of Large Data Sets. Proceedings of 2010 3rd International Conference on Computer and Electrical Engineering (ICCEE 2010). (pp. 167-170). . Chengdu, China. IEEE.
Abstract

Knowledge discovery from large data sets using classic data mining techniques has been proved to be difficult due to large size in both dimension and samples. In real applications, data sets often consist of many noisy, redundant, and irrelevant features, resulting in degrading the classification accuracy and increasing the complexity exponentially. Due to the inherent nature, the analysis of the quality of data sets is difficult and very limited approaches about this issue can be found in the literature. This paper presents a novel method to investigate the quality and structure of data sets, i.e., how to analyze whether there are noisy and irrelevant features embedded in data sets. In doing so, a wrapper-based feature selection method using genetic algorithm and an external classifier are mployed for selecting the discriminative features. The importance of features are ranked in terms of their frequency appeared in the selected chromosomes. The effectiveness of proposed idea has been investigated and discussed with some sample data sets.

Disciplines
Citation Information
Jinsong Leng, Craig Valli and Leisa Armstrong. "A Wrapper-Based Feature Selection for Analysis of Large Data Sets" (2010)
Available at: http://works.bepress.com/leisa_armstrong/17/