Article
HPC Application in Cloud Environment
Romanian Journal of Information Science and Technology
(2015)
Abstract
High Performance Computing applications on Cloud are of significance because of cost-effectiveness and elasticity. Reliability analysis of HPC applications on Cloud is an important area of study to better utilize infrastructure while dealing fault tolerant issues in a Cloud environment. In this work, we present a reliability model of a Cloud system under four scenarios: 1) Hardware components fail independently and software components fail independently; 2) software components fail independently and hardware components are correlated in failure; 3) correlated software failure and independent hardware failure; 4) dependent software and hardware failures. Moreover, we propose an optimal checkpoint placement technique based on reliability information for each scenario. Results show that if failure of the nodes and/or software in the system possesses a degree of dependency, the system becomes less reliable, which means that the failure rate increases and the mean time to failure decreases. Also, an increase in the number of nodes decreases the reliability of the system. Moreover, the optimal checkpoint interval decreases when the reliability of the system decreases.Â
Keywords
- fault tolerance,
- reliability,
- cloud computing,
- cloud performance
Disciplines
Publication Date
2015
Citation Information
Box Leangsuksun, M. Paun, R. Nassar and T. Thanakornworakij. "HPC Application in Cloud Environment" Romanian Journal of Information Science and Technology Vol. 18 Iss. 2 (2015) p. 109 - 125 Available at: http://works.bepress.com/box-leangsuksun/4/