Skip to main content
Article
Hierarchical Density-Based Clustering of Categorical Data and a Simplification
Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining (2007)
  • Bill Andreopoulos, York University
  • Aijun An, York University
  • Xiaogang Wang, York University
Abstract
A challenge involved in applying density-based clustering to categorical datasets is that the 'cube' of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data. HIERDENC offers a basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed. The characteristics of HIERDENC include: (i) it builds a hierarchy representing the underlying cluster structure of the categorical dataset, (ii) it minimizes the user-specified input parameters, (iii) it is insensitive to the order of object input, (iv) it can handle outliers. We evaluate HIERDENC on small-dimensional standard categorical datasets, on which it produces more accurate results than other algorithms. We present a faster simplification of HIERDENC called the MULIC algorithm. MULIC performs better than subspace clustering algorithms in terms of finding the multi-layered structure of special datasets.
Publication Date
May, 2007
Citation Information
Bill Andreopoulos, Aijun An and Xiaogang Wang. "Hierarchical Density-Based Clustering of Categorical Data and a Simplification" Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining (2007) p. 11 - 22
Available at: http://works.bepress.com/william-andreopoulos/22/