Skip to main content
Thesis
Towards Efficient Resource Provisioning in Hadoop
Engineering Ph.D. Theses
  • Peter P. Nghiem, Santa Clara University
Date of Award
6-1-2017
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2017.
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Engineering
First Advisor
Silvia M. Figueira
Subject Categories
Abstract

Considering recent exponential growth in the amount of information processed in Big Data, the high energy consumed by data processing engines in datacenters has become a major issue, underlining the need for efficient resource allocation for better energy-efficient computing. This thesis proposes the Best Trade-off Point (BToP) method which provides a general approach and techniques based on an algorithm with mathematical formulas to find the best trade-off point on an elbow curve of performance vs. resources for efficient resource provisioning in Hadoop MapReduce and Apache Spark. Our novel BToP method is expected to work for any applications and systems which rely on a tradeoff curve with an elbow shape, non-inverted or inverted, for making good decisions. This breakthrough method for optimal resource provisioning was not available before in the scientific, computing, and economic communities.

To illustrate the effectiveness of the BToP method on the ubiquitous Hadoop MapReduce, our Terasort experiment shows that the number of task resources recommended by the BToP algorithm is always accurate and optimal when compared to the ones suggested by three popular rules of thumbs. We also test the BToP method on the emerging cluster computing framework Apache Spark running in YARN cluster mode. Despite the effectiveness of Spark’s robust and sophisticated built-in dynamic resource allocation mechanism, which is not available in MapReduce, the BToP method could still consistently outperform it according to our Spark-Bench Terasort test results. The performance efficiency gained from the BToP method not only leads to significant energy saving but also improves overall system throughput and prevents cluster underutilization in a multi-tenancy environment. In General, the BToP method is preferable for workloads with identical resource consumption signatures in production environment where job profiling for behavioral replication will lead to the most efficient resource provisioning.

Citation Information
Peter P. Nghiem. "Towards Efficient Resource Provisioning in Hadoop" (2017)
Available at: http://works.bepress.com/peter-nghiem/3/