Genetic algorithms are powerful tools for k-nearest neighbors classifier optimization. While traditional knn classification techniques typically employ Euclidian distance to assess pattern similarity, other measures may also be utilized. Previous research demonstrates that GAs can improve predictive accuracy by searching for optimal feature weights and offsets for a cosine similarity-based knn classifier. GA-selected weights determine the classification relevance of each feature, while offsets provide alternative points of reference when assessing angular similarity. Such optimized classifiers perform competitively with other contemporary classification techniques. This paper explores the effectiveness of GA weight and offset optimization for knowledge discovery using knn classifiers with varying similarity measures. Using Euclidian distance, cosine similarity, and Pearson correlation, untrained classifiers are compared with weight-optimized classifiers for several datasets. Simultaneous weight and offset optimization experiments are also performed for cosine similarity and Pearson correlation. This type of optimization represents a novel technique for maximizing Pearson correlation-based knn performance. While unoptimized cosine and Pearson classifiers often perform worse than their Euclidian counterparts, optimized cosine and Pearson classifiers typically show equivalent or improved performance over optimized Euclidian classifiers. In some cases, offset optimization provides further improvement for knn classifiers employing cosine similarity or Pearson correlation.
Available at: http://works.bepress.com/michael_raymer/82/