Skip to main content
Article
Discovering knowledge from noisy databases using genetic programming
Journal of the American Society for Information Science
  • Man Leung WONG, Lingnan University, Hong Kong
  • Kwong Sak LEUNG, Chinese University of Hong Kong
  • C. Y., Jack CHENG, Chinese University of Hong Kong
Document Type
Journal article
Publication Date
1-1-2000
Disciplines
Abstract
In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines Genetic Programming and Inductive Logic Programming to induce knowledge represented in various knowledge representation formalisms from noisy databases. The framework is based on a formalism of logic grammars, and it can specify the search space declaratively. An implementation of the framework, LOGENPRO (The Logic grammar based GENetic PROgramming system), has been developed. The performance of LOGENPRO is evaluated on the chess end-game domain. We compare LOGENPRO with FOIL and other learning systems in detail, and find its performance is significantly better than that of the others. This result indicates that the Darwinian principle of natural selection is a plausible noise handling method that can avoid overfitting and identify important patterns at the same time. Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains.
DOI
10.1002/(SICI)1097-4571(2000)51:9<870::AID-ASI90>3.0.CO;2-R
E-ISSN
23301643
Publisher Statement

Copyright © 2000 John Wiley & Sons, Inc

Access to external full text or publisher's version may require subscription.

Full-text Version
Publisher’s Version
Citation Information
Wong, M. L., Leung, K. S., & Cheng, J. C. Y. (2000). Discovering knowledge from noisy databases using genetic programming. Journal of the American Society for Information Science, 51(9), 870-881. doi: 10.1002/(SICI)1097-4571(2000)51:9<870::AID-ASI90>3.0.CO;2-R