Skip to main content
Article
AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-stage Big Data Frameworks
26th International Conference on Computer Communications and Networks (ICCCN 2017) (2017)
  • Han Gao
  • ZHENGYU YANG, Northeastern University
  • Janki Bhimani, Northeastern University
  • Teng Wang
  • Jiayin Wang
  • Ningfang Mi
  • Bo Sheng
Abstract
Due to the flexibility of data operations and scalability of in-memory cache, Spark has revealed the potential to become the standard distributed framework to replace Hadoop for data-intensive processing in both industry and academia. However, we observe that the built-in scheduling algorithms in Spark (i.e., FIFO and FAIR) are not optimized for the applications with multiple parallel and independent branches in stages. Specifically, the child stage needs to wait and collect data from all its parent branches, but this wait has no guaranteed upper bound since it is tightly coupled with each branch’s workload characteristic, stage order, and their corresponding allocated computing resource. To address this challenge, we investigate a superior solution which ensures all branches acquire suitable resources according to their workload demand in order to let the finish time of each branch be as close as possible. Based on this, we propose a novel scheduling policy, named AutoPath, which can effectively reduce the overall makespan of such kind of applications by detecting and leveraging the parallel path, and adaptively assigning computing resources based on the estimated workload demands during runtime. We implemented the new scheduling scheme in Spark v1.5.0 and evaluated it with selected representative workloads. The experiments demonstrate that our new scheduler effectively reduces the makespan and improves resource utilizations for these applications, compared to the current FIFO and FAIR schedulers.
Keywords
  • Spark,
  • Scheduling,
  • Resource Management,
  • Task Assignment,
  • Workload Evaluation & Estimation
Disciplines
Publication Date
2017
Citation Information
Han Gao, ZHENGYU YANG, Janki Bhimani, Teng Wang, et al.. "AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Multi-stage Big Data Frameworks" 26th International Conference on Computer Communications and Networks (ICCCN 2017) (2017)
Available at: http://works.bepress.com/zhengyuyang/10/