"Hierarchical Density-Based Clustering of Categorical Data and a Simplification" by Bill Andreopoulos

Selected Works of William B. Andreopoulos

Follow Contact

Article

Hierarchical Density-Based Clustering of Categorical Data and a Simplification

Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining (2007)

Bill Andreopoulos, York University
Aijun An, York University
Xiaogang Wang, York University

Link Find in your library

Abstract

A challenge involved in applying density-based clustering to categorical datasets is that the 'cube' of attribute values has no ordering defined. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data. HIERDENC offers a basis for designing simpler clustering algorithms that balance the tradeoff of accuracy and speed. The characteristics of HIERDENC include: (i) it builds a hierarchy representing the underlying cluster structure of the categorical dataset, (ii) it minimizes the user-specified input parameters, (iii) it is insensitive to the order of object input, (iv) it can handle outliers. We evaluate HIERDENC on small-dimensional standard categorical datasets, on which it produces more accurate results than other algorithms. We present a faster simplification of HIERDENC called the MULIC algorithm. MULIC performs better than subspace clustering algorithms in terms of finding the multi-layered structure of special datasets.

Disciplines

Publication Date

May, 2007

Citation Information

Bill Andreopoulos, Aijun An and Xiaogang Wang. "Hierarchical Density-Based Clustering of Categorical Data and a Simplification" Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining (2007) p. 11 - 22
Available at: http://works.bepress.com/william-andreopoulos/22/