Skip to main content
Article
A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream
IEEE Transactions on Knowledge and Data Engineering
  • Yung-Yu Chung, Iowa State University
  • Srikanta Tirthapura, Iowa State University
  • IBM Almaden Research Center, IBM Almaden Research Center
Document Type
Article
Publication Version
Accepted Manuscript
Publication Date
6-1-2016
DOI
10.1109/TKDE.2016.2518679
Abstract

We present a simple, message-optimal algorithm for maintaining a random sample from a large data stream whose input elements are distributed across multiple sites that communicate via a central coordinator. At any point in time, the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system. We present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. We also consider the important case when the distribution of elements across different sites is non-uniform, and show that for such inputs, our algorithm significantly outperforms prior solutions.

Comments

This is a manuscript of an article published as Chung, Yung-Yu, Srikanta Tirthapura, and David P. Woodruff. "A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream." IEEE Transactions on Knowledge and Data Engineering 28, no. 6 (2016): 1356-1368. 10.1109/TKDE.2016.2518679. Posted with permission.

Rights
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Copyright Owner
IEEE
Language
en
File Format
application/pdf
Citation Information
Yung-Yu Chung, Srikanta Tirthapura and IBM Almaden Research Center. "A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream" IEEE Transactions on Knowledge and Data Engineering Vol. 28 Iss. 6 (2016) p. 1356 - 1368
Available at: http://works.bepress.com/srikanta-tirthapura/15/