Skip to main content
Contribution to Book
Identification of the Optimal Hadoop Configuration Parameters Set for Mapreduce Computing
Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI) (2015)
  • Jongyeop Kim, Georgia Southern University
  • Nohpill Park, Oklahoma State University
Abstract
This paper investigates on the techniques to search for optimal configuration parameters sets for Hadoop HDFS (Hadoop Distributed File System). An optimization technique, socalled the automated benchmarking configuration methodology (ABCM) [4], has been proposed and demonstrated by employing a two-staged sampling technique in order to mitigate the computational complexity and cost of the search process for the optimal configuration parameters set. In this paper, a few methods are further employed to sample those configuration parameters sets such as random Monte Carlo, correlation approaches (versus sequential approach in ABCM) in an effort to improve the level of the resulting performance from the identified optimal configuration parameters set and the execution time as well. Experiments are conducted to compare the level of the resulting performances, the Monte Carlo and Correlation coefficient-based algorithms are developed and implemented to identify a better set of Ω space [4] for a benchmark TestDFSIO in which the number of iterations are kept at the same for comparison purpose, and their resulting performances are compared against the sequential. It is observed that the optimal configuration parameters set identified by the Monte Carlo-based approach reduces the execution time of the benchmark run by 13.84% compared to the sequential sampling method, while the correlation-based method ended up with an unexpected result suspiciously due to lack of linearity of correlation which to be validated in the future work.
Publication Date
November 30, 2015
Publisher
IEEE Xplore
ISBN
978-1-4673-9641-7
DOI
10.1109/ACIT-CSI.2015.27
Publisher Statement
Citation Information
Jongyeop Kim and Nohpill Park. "Identification of the Optimal Hadoop Configuration Parameters Set for Mapreduce Computing" Okayama, JapanApplied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence (ACIT-CSI) (2015) p. 108 - 112
Available at: http://works.bepress.com/jongyeop-kim/1/