Skip to main content
Article
Streaming Algorithms for k-Means Clustering with Fast Queries
arXiv
  • Yu Zhang, Iowa State University
  • Kanat Tangwongsan, Mahidol University International College
  • Srikanta Tirthapura, Iowa State University
Document Type
Article
Publication Date
1-1-2017
Abstract

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. When compared with the current state-of-the-art, our methods provide a substantial improvement in the time to answer a query for cluster centers, while retaining the desirable properties of provably small approximation error, and low space usage. Our algorithms are based on a novel idea of "coreset caching" that reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both provable theoretical results and detailed experiments demonstrating their correctness and efficiency.

Comments

This is a manuscript of the article Zhang, Yu, Kanat Tangwongsan, and Srikanta Tirthapura. "Streaming algorithms for k-means clustering with fast queries." arXiv preprint arXiv:1701.03826 (2017). Posted with permission.

Language
en
File Format
application/pdf
Citation Information
Yu Zhang, Kanat Tangwongsan and Srikanta Tirthapura. "Streaming Algorithms for k-Means Clustering with Fast Queries" arXiv (2017)
Available at: http://works.bepress.com/srikanta-tirthapura/29/