"Efficient Layered Density-based Clustering of Categorical Data" by Bill Andreopoulos

Selected Works of William B. Andreopoulos

Follow Contact

Article

Efficient Layered Density-based Clustering of Categorical Data

Journal of Biomedical Informatics (2009)

Bill Andreopoulos, Technische Universität Dresden
Aijun An, York University
Xiaogang Wang, York University
Dirk Labudde, Technische Universität Dresden

Link Find in your library

Abstract

A challenge involved in applying density-based clustering to categorical biomedical data is that the ”cube” of attribute values has no ordering defined, making the search for dense subspaces slow. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data, and a complementary index for searching for dense subspaces efficiently. The HIERDENC index is updated when new objects are introduced, such that clustering does not need to be repeated on all objects. The updating and cluster retrieval are efficient. Comparisons with several other clustering algorithms showed that on large datasets HIERDENC achieved better runtime scalability on the number of objects, as well as cluster quality. By fast collapsing the bicliques in large networks we achieved an edge reduction of as much as 86.5%. HIERDENC is suitable for large and quickly growing datasets, since it is independent of object ordering, does not require re-clustering when new data emerges, and requires no user-specified input parameters.

Keywords

Clustering,
Bioinformatics,
Categorical,
Network,
Index,
Scalable,
Layered

Disciplines

Publication Date

April, 2009

DOI

10.1016/j.jbi.2008.11.004

Publisher Statement

SJSU users: use the following link to login and access the article via SJSU databases.

Citation Information

Bill Andreopoulos, Aijun An, Xiaogang Wang and Dirk Labudde. "Efficient Layered Density-based Clustering of Categorical Data" Journal of Biomedical Informatics Vol. 42 Iss. 2 (2009) p. 365 - 376
Available at: http://works.bepress.com/william-andreopoulos/13/