Skip to main content
Article
Parallel Streaming Random Sampling
arXiv
  • Kanat Tangwongsan, Mahidol University International College
  • Srikanta Tirthapura, Iowa State University
Document Type
Article
Publication Version
Submitted Manuscript
Publication Date
6-10-2019
Abstract

This paper investigates parallel random sampling from a potentially-unending data stream whose elements are revealed in a series of element sequences (minibatches). While sampling from a stream was extensively studied sequentially, not much has been explored in the parallel context, with prior parallel random-sampling algorithms focusing on the static batch model. We present parallel algorithms for minibatch-stream sampling in two settings: (1) sliding window, which draws samples from a prespecified number of most-recently observed elements, and (2) infinite window, which draws samples from all the elements received. Our algorithms are computationally and memory efficient: their work matches the fastest sequential counterpart, their parallel depth is small (polylogarithmic), and their memory usage matches the best known.

Comments

This is the pre-print of the article Tangwongsan, Kanat, and Srikanta Tirthapura. "Parallel Streaming Random Sampling." arXiv preprint arXiv:1906.04120 (2019). Posted with permission.

Copyright Owner
The Authors
Language
en
File Format
application/pdf
Citation Information
Kanat Tangwongsan and Srikanta Tirthapura. "Parallel Streaming Random Sampling" arXiv (2019)
Available at: http://works.bepress.com/srikanta-tirthapura/50/