"HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark" by V. A. Saletore

Selected Works of Matthew Tolentino

Follow Contact

Article

HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark

2013 IEEE International Symposium on Workload Characterization (IISWC)

V. A. Saletore
K. Krishnan
V. Viswanathan
M. E. Tolentino, University of Washington Tacoma

Find in your library

Publication Date

9-1-2013

Document Type

Conference Proceeding

Abstract

Big Data analytics using Map-Reduce over Hadoop has become a leading edge paradigm for distributed programming over large server clusters. The Hadoop platform is used extensively for interactive and batch analytics in ecommerce, telecom, media, retail, social networking, and being actively evaluated for use in other areas. However, to date no industry standard or customer representative benchmarks exist to measure and evaluate the true performance of a Hadoop cluster. Current Hadoop micro-benchmarks such as HiBench-2, GridMix-3, Terasort, etc. are narrow functional slices of applications that customers run to evaluate their Hadoop clusters. However, these benchmarks fail to capture the real usages and performance in a datacenter environment. Given that typical datacenter deployments of Hadoop process a wide variety of analytic interactive and query jobs in addition to batch transform jobs under strict Service Level Agreement (SLA) requirements, performance benchmarks used to evaluate clusters must capture the effects of concurrently running such diverse job types in production environments. In this paper, we present the methodology and the development of a customer datacenter usage representative Hadoop benchmark "HcBench" which includes a mix of large number of customer representative interactive, query, machine learning, and transform jobs, a variety of data sizes, and includes compute, storage 110, and network intensive jobs, with inter-job arrival times as in a typical datacenter environment. We present the details of this benchmark and discuss application level, server and cluster level performance characterization collected on an Intel Sandy Bridge Xeon Processor Hadoop cluster.

DOI

10.1109/IISWC.2013.6704672

Citation Information

V. A. Saletore, K. Krishnan, V. Viswanathan and M. E. Tolentino. "HcBench: Methodology, Development, and Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark" 2013 IEEE International Symposium on Workload Characterization (IISWC) (2013) p. 77 - 86
Available at: http://works.bepress.com/matthew-tolentino/3/