International Journal of Information Engineering and Electronic Business(IJIEEB)
ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online)
Published By: MECS Press
IJIEEB Vol.7, No.6, Nov. 2015
Improving Fault-Tolerant Load Balancing Algorithms in Computational Grids
Full Text (PDF, 698KB), PP.53-62
Fault tolerant scheduling of many jobs in an environment with millions of unpredictable nodes is not an easy issue. To the best of our knowledge, no work in the literature has proposed a solution that combines the merits of active and passive replication schemes of fault tolerance with the advantages of performance-driven load balancing so as to make the most of the strong points of each. While extensive fault tolerant scheduling and load balancing methods have been presented for the sequential jobs, none have taken into account fault-tolerant load balancing that minimizes the job make-span, provides efficient network and node utilization, achieves a Ill-balanced load and high system flexibility even during the resource failures. Hence, in this article, I present an Adaptive Scheduling Algorithm namely ASA that overcomes these problems. With thorough simulations, I conclude that ASA allocates any number of jobs to a million nodes with relatively low overhead and high flexibility. Experimental results show that the performance of ASA is better than those of its counterparts.
Cite This Paper
Jasma Balasangameshwara,"Improving Fault-Tolerant Load Balancing Algorithms in Computational Grids", IJIEEB, vol.7, no.6, pp.53-62, 2015. DOI: 10.5815/ijieeb.2015.06.08
Minoli, Daniel, "A Networking Approach to Grid Computing," Wiley-Interscience, 2004.
Lu, Kai, Riky Subrata, and Albert Y. Zomaya, "On the Performance-driven Load Distribution for Heterogeneous Computational Grids," Journal of Computer and System Sciences vol. 73, no. 8 pp. 1191-1206, 2007.
Ernemann, Carsten, Volker Hamscher, UI Schwiegelshohn, Ramin Yahyapour, and Achim Streit, "On Advantages of Grid Computing for Parallel Job Scheduling," In Cluster Computing and the Grid, 2nd IEEE/ACM International Symposium on, pp. 39-39, 2002.
Subrata, Riky, Albert Y. Zomaya, and Bjorn Landfeldt, "Artificial Life Techniques for Load Balancing in Computational Grids," Journal of Computer and System Sciences vol.73, no. 8 pp. 1176-1190, 2007.
Erciyes, Kayhan, "A Replication-based Fault Tolerance Protocol Using Group Communication for the Grid," In Parallel and Distributed Processing and Applications, Springer Berlin Heidelberg, pp. 672-681, 2006.
Zhu, Xiaomin, Xiao Qin, and Meikang Qiu. "QoS-aware Fault-tolerant Scheduling for Real-time Tasks on Heterogeneous Clusters," IEEE Transactions on Computers, vol. 60, no. 6, pp. 800-812, 2011.
Balasangameshwara Jasma and Nedunchezhian Raju, "Performance-Driven Load Balancing with Primary-Backup Approach for Computational Grids with Low Communication Cost and Replication Cost," IEEE Transactions on Computers, vol. 62, no. 5, pp. 990-1003, 2013.
Celaya, Javier, and Unai Arronategui, "A Task Routing Approach to Large-Scale Scheduling," Future Generation Computer Systems, vol.29, pp. 1097-1111, 2013.
Zheng, Qin, Bharadwaj Veeravalli, and Chen-Khong Tham. "On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs," IEEE Transactions on Computers, vol. 58, no. 3, pp. 380-393, 2009.
K. Lu, R. Subrata, and A. Y. Zumaya, "On the Performance Driven Load Distribution for Heterogeneous Computational Grids," Journal of Computer and System Science, vol. 73, no. 8, pp. 1191-1206, 2007.
T. D. Braun, H.J. Siegel and N. Beck, "A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems," Journal. Of Parallel and Distributed Computing, vol. 61, pp. 810-837, 2001.
S.-D. Wang, I.-T. Hsu and Z.-Y. Huang, "Dynamic Scheduling Methods for Computational Grid Environments," Proc. Int'l Conf. Parallel and Distributed Systems, vol. 1, pp. 22-28, 2005.
Y.-H. Lee, S. Leu and R.-S. Chang, "Improving Job Scheduling Algorithms in a Grid Environment," Future Generation Computer Systems, vol. 27, pp. 991-998, 2011.
W. Luo, J. Li, F. Yang, G. Tu, L. Pang and L. Shu, "DYFARS: Boosting Reliability in Fault-Tolerant Heterogeneous Distributed Systems through Dynamic Scheduling," Proc. Eighth ACIS Int'l Conf. Software Eng., Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD '07), pp. 640-645, 2007.
Yajun Li, Yuhang Yang, Maode Ma, Liang Zhou, "A Hybrid Load Balancing Strategy of Sequential Tasks for Grid Computing Environments," Future Generation Computer Systems, no. 25, pp. 819-828, 2009.
Khan, Rafiqul Z., and Md F. Ali. "An Efficient Diffusion Load Balancing Algorithm in Distributed System." International Journal of Information Technology and Computer Science (IJITCS) 6, no. 8 (2014): 65.
Rahdari, Danial, Amir Masoud Rahmani, Niusha Aboutaleby, and Ali Sheidaei Karambasti. "A Distributed Fault Tolerance Global Coordinator Election Algorithm in Unreliable High Traffic Distributed Systems." International Journal of Information Technology and Computer Science (IJITCS) 7, no. 3 (2015): 1.